formal methods in automated design debugging › bitstream › 1807 › 17828 › ... ·...

Formal Methods in Automated Design Debugging

by

Sean A. Safarpour

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

Copyright c© 2009 by Sean A. Safarpour

Abstract

Formal Methods in Automated Design Debugging

Sean A. Safarpour

Doctor of Philosophy

Graduate Department of Electrical and Computer Engineering

University of Toronto

2009

The relentless growth in size and complexity of semiconductor devices over the last decades

continues to present new challenges to the electronic design community. Today, functional

debugging is a bottleneck that jeopardizes the future growth of the industry as it can account

for up to 30% of the overall design effort. To alleviate the manual debugging burden for

industrial problems, scalable, practical and robust automated debugging solutions are required.

This dissertation presents novel techniques and methodologies to bridge the gap between

current capabilities of automated debuggers and the strict industry requirements. The contri-

butions proposed leverage powerful advancements made in the formal method community, such

as model checking and reasoning engines, to significantly ease the debugging effort.

The first contribution, abstraction and refinement, is a systematic methodology that reduces

the complexity of debugging problems by abstracting irrelevant sections of the circuits under

analysis. Powerful abstraction techniques are developed for netlists as well as hierarchical and

modular designs. Experiments demonstrate that an abstraction and refinement methodology

requires up to 200 times less run-time and 27 times less memory than a state-of-the-art debugger.

The second contribution, Bounded Model Debugging (BMD), is a debugging methodology

based on the observation that erroneous behaviour is more likely caused by errors excited

temporally close to observation points. BMD systematically generates a series of consecutively

larger yet more complete debugging problems to be solved. Experiments show the effectiveness

ii

of BMD as 93% of the large problems are solved with BMD versus 34% without BMD.

A third contribution is an automated debugging formulation based on maximum satisfia-

bility. The formulation is used to build a powerful two step, coarse and fine grained debugging

framework providing up to 980 times performance improvements.

The final contribution of this thesis is a trace reduction technique that uses reachability

analysis to identify the observed failure with fewer simulation events. Experiments demonstrate

that many redundant state transitions can be removed resulting in traces with up to 100 times

fewer events than the original.

iii

Acknowledgements

I wish to express my sincere gratitude to my Ph.D. supervisor, Professor Andreas Veneris

for being a mentor, a colleague and a friend. Over the course of my research at the University

of Toronto, Professor Veneris has been a steady source of passion, motivation, and guidance.

Many thanks to my Ph.D. committee members Professors Farid Najm, Andreas Moshovos,

Masahiro Fujita, Jason Anderson, Frank Kschischang and Charles Mims for their thorough

reviews of my dissertation and their insightful suggestions.

I am also indebted to the Vennsa family and my colleagues at the University of Toronto for

our fruitful discussions and their technical support of my research. Special thanks to Duncan

Smith, Terry Yang, Hratch Mangassarian, Brian Keng, Yibin Chen, Patrick Halina, Evean Qin,

Alan Baker and Andrew Ling.

I would like to thank my parents for being my foundation, role models, and an endless

source of love and support. Many thanks to my sister, grandmother and the rest of my dear

family for their advice and encouragement through my academic endeavours. I wish to express

my appreciation to Sarah Osmanski for her encouragement and selfless support of my goals and

ambitions.

I would like to express my gratitude to the Natural Sciences and Engineering Research

Council of Canada (NSERC) for their financial support of my research.

iv

Contents

List of Tables ix

List of Figures x

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Design Verification and Design Debug . . . . . . . . . . . . . . . . . . . . 4

1.2 Automated Design Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.1 Abstraction and Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.2 Bounded Model Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.3 Max-sat based Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3.4 Trace reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Background 13

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Verification Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Automated Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 Complexity of Automated Debugging . . . . . . . . . . . . . . . . . . . . 18

2.4 Simulation- and BDD-based Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 SAT-Based Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Boolean Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

v

2.6.1 CNF Representation for Boolean SAT . . . . . . . . . . . . . . . . . . . . 24

2.6.2 Boolean Satisfiability Algorithms . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.3 Maximum Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Debugging using Abstraction and Refinement 29

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.1 Abstraction and Refinement in Model Checking . . . . . . . . . . . . . . . 32

3.3 Debugging with Abstraction and Refinement . . . . . . . . . . . . . . . . . . . . 33

3.3.1 Guaranteeing Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 Spurious Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.3 Guaranteeing Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.4 Overall Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 State Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4.1 Trace Length Reduction Benefits . . . . . . . . . . . . . . . . . . . . . . . 40

3.5 Function Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5.1 Hierarchical abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


3.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6.1 State Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6.2 Function Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Bounded Model Debugging 55

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3 Bounded Model Debugging Formulation . . . . . . . . . . . . . . . . . . . . . . . 58

4.3.1 Probability Analysis of Error Behaviour . . . . . . . . . . . . . . . . . . . 58

4.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.3.3 Impact on Error Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . 69

vi

4.3.4 Improvements to Basic Methodology . . . . . . . . . . . . . . . . . . . . . 72


4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5 Debugging using Max-SAT 82

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.2.1 Maximum Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.3 Debugging Combinational Circuits with Max-sat . . . . . . . . . . . . . . . . . . 85

5.3.1 Error Clause Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.3.2 Error Group Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.4 Extension to Sequential Circuits and Multiple Vectors . . . . . . . . . . . . . . . 89

5.5 Debugging with Approximate Max-sat . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5.1 Efficient Max-sat Framework . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6 Trace Reduction 100

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.2.1 Finite State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.2.2 Image and Pre-image Computation . . . . . . . . . . . . . . . . . . . . . . 103

6.2.3 Reachability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.3 Proposed Trace Compaction Approach . . . . . . . . . . . . . . . . . . . . . . . . 105

6.3.1 Reachability Based Trace Compaction . . . . . . . . . . . . . . . . . . . . 105

6.3.2 Creating More Short-cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.3.3 State Selection Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.3.4 All-Solution SAT Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.4 Storing Visited States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

vii

6.4.1 Determining State Containment Relationships . . . . . . . . . . . . . . . 110

6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7 Conclusion and Future Work 118

7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.2.1 Extension of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.2.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.3 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Bibliography 126

viii

List of Tables

2.1 Simple gates and their CNF representation . . . . . . . . . . . . . . . . . . . . . 25

3.1 Statistics for problems and the stand-alone SAT-based debugging approach . . . 45

3.2 Performance statistics for abstraction and refinement debugging framework . . . 46

3.3 Summary of b14 when abstracting over 80% of flip-flops . . . . . . . . . . . . . . 49

3.4 Summary of ac97 when abstracting over 96% of flip-flops . . . . . . . . . . . . . . 49

3.5 Summary of problems for function abstraction . . . . . . . . . . . . . . . . . . . . 50

3.6 Results of proposed function abstraction and refinement technique . . . . . . . . 50

4.1 Likelihood of errors on gates of Figure 4.1 being excited . . . . . . . . . . . . . . 60

4.2 Circuit and performance statistics without BMD . . . . . . . . . . . . . . . . . . 76

4.3 Performance with BMD on increment size of 10 clock cycles . . . . . . . . . . . 77

5.1 Max-sat+debug versus stand-alone debugger . . . . . . . . . . . . . . . . . . . . 96

6.1 Results of proposed trace length compaction for traces of length 50, 100, 1000. . 115

6.2 Summary of the results for the proposed trace length compaction approach . . . 116

ix

List of Figures

1.1 Simplified VLSI design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Flow chart of typical manual debugging process. . . . . . . . . . . . . . . . . . . 5

2.1 A circuit with failure observed at O1. . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Debugging in a modern simulation-based verification flow . . . . . . . . . . . . . 17

2.3 Circuit before and after adding correction models . . . . . . . . . . . . . . . . . . 22

2.4 Example: circuit and CNF representation . . . . . . . . . . . . . . . . . . . . . . 26

2.5 Basic DPLL SAT solving algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Circuit before and after abstracting flip-flop q1 . . . . . . . . . . . . . . . . . . . 34

3.2 Demonstrating the effect of unconstrained inputs on abstract circuit . . . . . . . 35

3.3 Abstract circuit unfolded over two time frames . . . . . . . . . . . . . . . . . . . 37

3.4 Debugging algorithm with state abstraction and refinement . . . . . . . . . . . . 39

3.5 Reduced trace V ′ due to abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.6 Function F 11 is composed of functions F 2

2 and F 23 . . . . . . . . . . . . . . . . . . 42

3.7 Hierarchical abstraction and refinement example . . . . . . . . . . . . . . . . . . 43

3.8 Hierarchical debugging algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.9 Logic and trace reduction vs. flip-flops abstracted . . . . . . . . . . . . . . . . . . 45

3.10 Solve time and # literals vs. # of refinement and debugging iterations . . . . . . 52

4.1 Sample pipeline circuit with single output . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Five time frame ILA for circuit in Figure 4.1 . . . . . . . . . . . . . . . . . . . . 59

4.3 Illustration of example where error is excited in cycle 1 and observed in cycle d . 63

x

4.4 Plotting pd as function of d with prop = obs = {0.1, 0.5, 0.9} . . . . . . . . . . . . 63

4.5 Plotting pd as function of prop and obs . . . . . . . . . . . . . . . . . . . . . . . . 64

4.6 ILA for length three with error excited on gate A . . . . . . . . . . . . . . . . . . 66

4.7 Suffix of size two of ILA shown in Figure 4.6 . . . . . . . . . . . . . . . . . . . . 67

4.8 ILA of Figure 4.7 annotated with error suspects and initial state suspects . . . . 69

4.9 Simple circuit with single error source on gate A . . . . . . . . . . . . . . . . . . . 69

4.10 Example of single error source excited in two clock cycles . . . . . . . . . . . . . 70

4.11 Example of pipelined circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.12 Three time frame ILA of circuit in Figure 4.11 . . . . . . . . . . . . . . . . . . . 71

4.13 Example with error source A propagating through three DFFs . . . . . . . . . . . 72

4.14 Complete BMD algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.15 Memory usage versus CPU run-time for four selected problems . . . . . . . . . . 79

4.16 Debugger solutions found versus BMD iterations . . . . . . . . . . . . . . . . . . 80

5.1 Correct and erroneous circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.2 Erroneous sequential circuit and its ILA representation . . . . . . . . . . . . . . . 90

5.3 Error masking in clause groupings . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.4 Max-sat debugging framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.5 Run-time versus clause grouping size . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.6 Number of solved instances for max-sat20+debug and debug . . . . . . . . . . . 97

5.7 Run-time comparison for max-sat20+debug and debug . . . . . . . . . . . . . . . 98

6.1 Finite State Machine with 7 states . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.2 A sample trace for the above FSM . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.3 Illustration of reachability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.4 Updating the graph G with new nodes and edges . . . . . . . . . . . . . . . . . . 106

6.5 Illustrating rules 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.6 Trace compaction procedure using reachability analysis . . . . . . . . . . . . . . . 109

6.7 Illustrating state storage data structure . . . . . . . . . . . . . . . . . . . . . . . 111

6.8 Finding supersets and subsets in the tree T . . . . . . . . . . . . . . . . . . . . . 112

xi

6.9 Determine the states that are supersets of this state . . . . . . . . . . . . . . . . 113

6.10 Comparison of state selection methods . . . . . . . . . . . . . . . . . . . . . . . . 114

xii

Chapter 1

Introduction

1.1 Motivation

The relentless consumer demand for devices with complex functionality and superior perfor-

mance continues to drive the fast growth of the semiconductor industry. In the past 20 years,

this industry has continuously generated electronic devices with more functionalities, superior

performance, smaller physical size, and reduced power consumption. For instance, as fore-

casted by Moore’s Law [72], in 1979 Intel’s 8088 microprocessor had 29,000 transistors while

its counterpart in 2007, Intel’s Quad-core Xeon processor has 820,000,000 transistors [47]. This

exponential efficiency growth over the past decades has stretched the performance limits for

certain aspects of the design flow for Very Large Scale Integration (VLSI) systems.

In particular, during the last 10 years, there has been an significant increase in the cost

and time required for verification and debugging of those systems. This is partially because

of the complex nature of the modern semiconductor design flow. The processes of design

and verification are comprised of heterogeneous components implemented at multiple levels

of abstractions (procedural, behavioural, synthesizable, etc.) using different languages and

various standards. Interacting with such components adds several layers of complexity. The

lack of a unified and centralized verification environment makes verification and debugging

more challenging. Further, design specifications often described in abstract models may not

directly correspond to signals and transactions at the design level. The separation between

1

Chapter 1. Introduction 2

(text, C/C++, matlab)

Register Transfer Level

Verilog/VHDL

Specification

High−level vs. RTL

Verification

Verification

RTL vs. netlist

Implement ortranslate

Synthesis

Place and route

Gatelevel netlist

Placed and routed netlist

GDSII

Figure 1.1: Simplified VLSI design flow

these two layers can result in misinterpretations and usually complicates the verification efforts.

Additionally, the size of modern devices poses additional performance constraints to existing

automated design tools. Task outsourcing to geographical dispersed design and verification

teams only adds additional layers of communication overhead to verification and debugging.

In contemporary VLSI design flows, many interdependent steps must be orchestrated and

fine-tuned for designers to develop an Integrated Circuit (IC) successfully. If any of these

steps introduce errors or time delays, the success of the entire product, or even future sales

projections of the company, may be in jeopardy. Figure 1.1 illustrates the major steps of

the IC design process. A product inception typically starts with a specification outlining the


IC behaviour at the system level. The specification can be described in a plain document,

as a Matlab model or it can be specified in a software language such as C/C++. Next, the

specification is implemented by designers at the Register Transfer Level (RTL) in languages such

as Verilog or VHDL. Following RTL implementation, verification ensures that the specification

matches the generated RTL. When an adequately high level of confidence is reached about the

correctness of the RTL, circuit synthesis generates a gate-level netlist that models the design

using components in the target fabrication technology. Next, place and route tools model the

circuit at the physical layout level using transistors and wires. Finally the design is provided

to an IC fabrication plant in a format such as GDSII so that a prototype can be manufactured.

The prototype later undergoes test, a process that may reveal functional errors that propagate

to the silicon or it may show the presence of fabrication defects. If the chip prototype is found

to be faulty, different steps of the design cycle in Figure 1.1 are repeated, until the chip passes

test and it is released for mass production.

In the history of Electronic Design Automation (EDA) industry, most of the manual steps

in the VLSI design flow have been fully or partially automated. Most notably, design synthesis,

place and route and automated test pattern generation tools populate the fast growing EDA

marketplace [7, 55]. A critical efficiency challenge in the VLSI design cycle that can benefit from

more automation is functional verification and debug. It is estimated that these two processes

contribute to as much as 70% of the effort of architecting and designing a new semiconductor

device [5, 6, 32, 80]. Both are emerging problems with significant impact on the modern design

cycle.

In 2006, the well respected International Technology Roadmap for Semiconductors (ITRS),

issued its new set of needs for the current and next generation design processes for semicon-

ductors. Although most topics saw minor numeric revisions, the roadmap contains a major,

fourteen-page update in design verification with a strong emphasis in debugging. The report

states that “technological progress depends on the development of rigorous and efficient methods

to achieve high-quality verification results ... and techniques to ease the burden of debugging a

design once a bug is found”. It continues, “Verification must increasingly integrate more formal

and semi-formal engines to complement simulation... Without major breakthroughs, verification


will be a non-scalable, show-stopping barrier to further semiconductor progress” [48]. Without

a doubt, the roadmap depicts a grim yet realistic picture that establishes an urgent need for

scalable automated verification and debugging techniques. Further evidence is also provided

by the fact that the ratio of verification engineers to design engineers employed in a typical

microprocessor team has quadrupled over the last decade [48]. Clearly, this trend must be

controlled, and those processes need be automated, for the semiconductor industry to maintain

its historic growth rate for the past 40 years.

1.1.1 Design Verification and Design Debug

Verification is the process that ensures a design behaves correctly and according to its speci-

fication. Motivated by the verification challenge outlined above, the academic and industrial

communities have introduced countless automated tools, methodologies and analysis engines

to reduce the bottleneck. Major advancements in this field include equivalence checkers [38],

property checkers [16], functional coverage analysis tools [34], automated testbench genera-

tion [1], Assertion Based Verification (ABV) [36] and the use of Binary Decision Diagrams

(BDDs) [57, 93] and Boolean Satisfiability (SAT) solvers [27, 74]. These techniques and many

others allow for the efficient exploration of the design space to identify errors, exercise opera-

tional corner cases and assess the quality of the verification process itself. Whereas verification

engineers only had simulation tools to aide them a decade ago, today there exist specialized

formal verification and functional error coverage tools to help them prove correctness of a de-

sign or identify the presence of errors. With the advent of these new techniques and tools, the

exponential growth of the verification effort can be lessened to a certain extent.

Once a functional discrepancy is found between a design and its specification during verifi-

cation, debugging attempts to localize the error source or bug. This localization process is done

with reference to RTL files, gate-level netlists, schematic views of the design, or even tempo-

rally within a simulation trace. Although many verification processes have been automated,

once they detect the existence of errors, the task of RTL debugging remains a pure and tedious

manual procedure. It is a time-consuming and resource-intensive task that takes days, or even

weeks, to converge, it increases the non-recurring costs and it may jeopardize the release date


Bug found? Bug found?

Verification engineers

attempts to find error cause

Failure discovered

by verification process

...

No

Yes

Designer 1 looks for bug Designer K looks for bug

Yes

No

Fix bugFix bug

Give to next designerGive to next designer

Figure 1.2: Flow chart of typical manual debugging process.

of a chip. A typical debugging process comprises of arduous tasks such as collecting informa-

tion from the verification tools, manually back-tracing signals using Graphical User Interface

(GUI) waveform and source code (RTL or netlist) viewers, and performing “what-if” analysis

on potential error suspects. Once a candidate error location is found, a designer or engineer can

rectify the design and remove the bug. In virtually all cases, to ensure that the fix is adequate

and not limited to a specific test case, the verification process is usually repeated.

Part of the debugging pain today stems from the fact that designers are responsible for

blocks as large as half a million synthesized gates for systems that are comprised by dozens

of such blocks integrated together. In the days of globalization, sometimes design groups may

not even be within a geographical proximity as they build and integrate these pieces together.

The result is many functional errors due to miscommunication between designers, incorrect

interpretation of the specification, incomplete specification or due to the human error factor.


Design bugs are incredibly hard to diagnose. They require a detailed knowledge of blocks and

a broader system level understanding, a wide body of knowledge which is virtually not available

to a single designer/verification engineer. Consequently, it is common for a verification failure

to be passed from designer to designer until the exact error source of the bug is discovered.

A typical contemporary debugging process is depicted in Figure 1.2. This figure presents

a flow chart of how debugging is performed at the system level between verification engineers

and designers. Notice that for complicated bugs it is possible for the debugging process to

consume many designers and verification engineers before the appropriate fix is made. Taking

into account that designers are under strict time constraints, each bug not only delays the

design for hours or days at a time, but it also pushes the release date of the final product, a

fact with important financial and marketing consequences.

Over the last decade, VLSI design companies have overcome the debugging obstacle by

allocating more verification engineers to the problem. Today, there are two to three times more

verification engineers than designers in most companies [6]. It is clear that adding verification

engineers to the problem is not a scalable solution in the long term as the debugging pain

continues to clime. Automated debugging techniques are urgently required to alleviate the

manual debugging problem and drastically improve the design efficacy [31, 48].

1.2 Automated Design Debugging

The discrepancies described in the previous section between design demands, cost projections

and EDA capabilities in verification and debugging, introduce uncertainties, escalate costs

and delay product delivery. It is clear that at the heart of verification, there is a highly

manual debugging task [31]. Current industrial debugging methodologies use non-automated

solutions that package simulation results, waveform and source code viewers/editors under

one roof. These tools involve time-consuming human-driven manual processes that introduce

uncertainties in design closure.

The research community has dedicated significant efforts in automated fault (defect) diag-

nosis techniques in the last three decades to diagnose silicon prototypes that fail test. These


techniques rely on algorithms such as simulation, path-tracing, and Binary Decision Dia-

grams [1, 2, 45] and have been successful at localizing physical defects introduced by the

fabrication process. On the other hand, their success at debugging functional RTL errors

has been limited. At first sight, fault diagnosis and functional RTL design debugging can be

viewed as complementary problems with similar goals at different stages of the design cycle.

However, after a closer look, there are considerable differences that impact the performance of

algorithms addressing these problems.

First, fault diagnosis problems have a large set of observation and controllability points due

to embedded test hardware mechanisms such as scan chains [1] and trace buffers [101]. Scan

chains allow the test engineer to inject particular vectors at a specific area in the design and

observe their exact behaviour at a fixed clock cycle(s). This is a destructive process where the

test needs to be re-run. Similarly, trace buffers can observe or inject a series of line values

during regular chip operation without disturbing normal operation. These abilities may not

exist in design debugging where the golden model and the design may have completely different

representations. For example, the specification can be a Matlab program while the design is a

Verilog or VHDL representation.

Next, scan chains can sometimes transform the sequential diagnosis problem to a purely

combinational one where faults can be identified directly in the combinational logic. In contrast,

RTL debugging must diagnose sequential errors observed many clock cycles after the error is

excited. Finally, defect diagnosis makes use of fault models that can predict the functional

behaviour of various manufacturing defects. No corresponding models exist for functional RTL

errors, a fact that complicates further the debugging process.

Recently, a new genre of automated debugging methodologies based on formal engines such

as Boolean Satisfiability (SAT) solvers, Quantified Boolean Formula (QBF) solvers and maxi-

mum SAT (max-sat) solvers have been shown to outperform conventional techniques in design

and silicon debug [18, 85, 91]. SAT engines have been used to drive many other CAD tools

for VLSI such as in equivalence checkers [39], model checkers [8], and automated test pattern

generators [56]. Their success sprung interest in other EDA areas such as logic synthesis [102],

place and route [76], technology mapping [87], and fault diagnosis [35, 92]. By formulating


the RTL/fault diagnosis problem in terms of a SAT problem, great performance and capacity

improvements have been achieved [91]. It has been shown, that other implementations using

QBF and max-sat solvers can sometime further improve performance.

Another notable recent development in automated debugging is the use of the design hierar-

chy to incrementally diagnose the blocks or modules of a design [3]. Results from this work are

reported on real-life industrial designs to demonstrate the method’s effectiveness. Despite these

significant contributions, automated debugging technology continues to faces the challenges of

today’s the increasing design size and overall problem complexity.

Broadly speaking, there are three factors that limit the effectiveness of automated debugging

once verification fails. One factor is the design size that directly impacts the solution space. As

the gate count of a design simply increases, the number of potential error locations to consider

also increases exponentially to the number of errors [18, 99]. Another factor is the underlying

state size of the sequential design. This impacts temporally and spatially how errors can be

excited and observed. A last factor is the length of the simulation trace or counter-example

under analysis. For instance, traces in the industry today range from a few hundred cycles for

formal techniques, to tens of thousand of cycles for simulation-based verification. For automated

debugging solutions to be practical in the industry, all these aspects of the problem complexity

must be efficiently addressed.

1.3 Contribution

This thesis attempts to bridge the gap between current automating debugging capabilities and

contemporary industrial needs. It does so by introducing four novel techniques aimed to address

the three limiting factors discussed earlier: design size, sequential complexity, and length of

counter-example. By managing these three factors, experiments in this thesis show that much

larger designs with much longer traces can be handled by conventional automated debuggers.

As such, those results address important practical problems and generate new ground for further

research in the field.


In summary, this work makes the following contributions.

• It introduces the concept of abstraction and refinement in debugging. Abstraction and

refinement has been of great help to modern verification methodologies. In effect, it

reduces the size of the problem, in both gate count and sequential complexity. This thesis

tailors the solution to debugging and confirms similar performance gains.

• It presents a complete bounded model debugging methodology that incrementally solves

the problem with less computational resources. It does so by considering a window for

debugging starting from the cycle where a failure is first observed. The method stems from

similar concepts in formal verification, but the theory and implementation for debugging

is very different.

• It builds a novel debugging formulation based on maximum satisfiability. The method

is found to be effective for over-approximating of the debugging problem. As such, a

powerful two-step, coarse and fine grained debugging framework is developed.

• It presents a trace length reduction technique that identifies redundant events in simu-

lation traces that can be removed. The resulting shorter trace can reduce the memory

requirements for debugging tools dramatically.

We now discuss each of these contributions and their impact in debugging in detail.

1.3.1 Abstraction and Refinement

Abstraction and refinement reduces the debug problem by identifying and removing sections

of the design that are irrelevant to the errors in the design. The first step, namely abstrac-

tion, simplifies the design by removing components such as state elements or modules. Once

abstracted, automated debugging can operate on a much simpler and smaller version of the er-

roneous design. If the bug is in the abstracted elements, the method identifies this discrepancy

and re-introduces them during a refinement step. Following this, consecutive steps of debug

and refinement are performed iteratively until all error locations are found. Theory presented

in this thesis guarantees the correctness and completeness of the method.


Two types of abstraction techniques are discussed here. State abstraction focuses on ab-

stracting state elements leading to a smaller state space and possibly a shorter trace. Function

abstraction is a powerful technique that simplifies the design and its functionality. Function

abstraction can be applied iteratively at different levels of the design hierarchy providing added

benefits. State abstraction is ideal for design with little or no high level information while

function abstraction is a powerful technique for high level or RTL designs.

Although in the worst case abstraction and refinement can result in re-introducing the

entire design and thus debugging the original problem, in practice, this seldom happens. Just

like in verification, the technique presents a powerful solution that usually reduces the problem

complexity to just a small fraction of the original. Empirical data demonstrate that abstraction

and refinement effectively allow a debugger to tackle larger industrial designs than previously

possible.

1.3.2 Bounded Model Debugging

Bounded Model Debugging (BMD) is based on the observation that the erroneous behaviour of

the design is more likely caused by errors excited temporally close to observation points such as

primary outputs. For instance, when a stimulus combination excites an error, it is more likely

that the failure is observed immediately, within the next few clock cycles, rather than later.

This intuitive observation is shown to hold both statistically and empirically as a robust BMD

framework is developed to provide both significant run-time and memory improvements.

The BMD framework starts by formulating a small debugging problem based on the last

clock cycles of an error trace, called the suffix. These clock cycles are important because it is

highly likely that the error is excited at these times. By solving the smaller debugging problem,

many potential error sources can be found very quickly. In the event that the error is excited in

clock cycles prior to the ones under consideration, BMD will detect this scenario and formulate

a subsequent problem with a longer suffix. Subsequently, debugging is performed and the suffix

length is increased until all the error sources are identified. Theory presented in this thesis

guarantees the correctness and completeness of BMD.


The BMD framework empirically confirms the theory and statistical analysis developed in

this dissertation. More specially, experiments show that over 1/3 of debugging problems are

completely solved with suffixes of 10 clock cycles or less. As a result, run-time performance

improvements of up to two orders of magnitude are realized under BMD. Furthermore, while a

state-of-the-art debugger finds the error source in only 34% of problem instances, BMD manages

to find the errors in over 93% of the problems.

1.3.3 Max-sat based Debugging

This thesis presents the first maximum satisfiability (max-sat) formulation for design debug-

ging. The key idea is that since the incorrect design cannot produce the correct response, the

corresponding satisfiability problem is inherently unsatisfiable and it can only be satisfied if

some of the constraints are removed. A max-sat solver identifies which parts of the constraints

to remove. This subset of the original problem corresponds directly to the erroneous part of the

design which may contain the bug(s). The proposed max-sat formulation is shown to be both

functionally correct and complete and thus a viable alternative to SAT-based techniques [3].

The proposed max-sat formulation can also be used to over-approximate solutions by loosen-

ing the problem constraints. The over-approximation allows for a trade-off between the max-sat

solver’s run-time and the resolution of the solutions. More specifically, approximation can re-

duce the problem complexity and thus require less run-time at the cost of finding less precise

solutions. Although not exact, this approach can be employed as a pre-processing step that

filters solutions for another complete debugger. This debugger will have fewer error sources to

consider and it will return faster. Experiments show, this combined two-step coarse and fine

grained debugging framework results in more efficient automated debugging solutions.

1.3.4 Trace reduction

The simulation trace contains crucial information about the debugging problem as it provides

the stimulus and expected behaviour of the design. Since most automated debugging techniques

model the problem for every clock cycle of an automated debugger, a large simulation trace

can have an adverse effect on the memory requirements and performance of an automated


debugger. Unfortunately, the trace length is often a factor hard to control by the engineer or by

the respective CAD tools, as traces are automatically provided from testbenches or verification

processes. The goal in trace reduction is to generate a new trace from the original one that

exposes the bug under similar conditions but within a fewer number of clock cycles.

The trace reduction algorithm developed in this thesis uses reachability analysis techniques

based on circuit observability don’t cares [82, 83]. It identifies which states and input conditions

are necessary to excite and observe errors. Once these conditions are identified, redundant

events and specific state transitions can be eliminated. Efficient algorithms and data structures

are proposed that allow quick identification of “short-cuts” in the given trace. In practice,

where simulation traces are often generated from random or constrained random testbenches,

it is possible to reduce many redundant state transitions and find a shorter path from an initial

state, to the error excitation state, and to the error observation state. As a result, a simulation

trace may be reduced to a fraction of its original size thus making the debugging problem much

smaller and easier for existing automated debuggers to handle.

1.4 Thesis Outline

This thesis is structured as follows. In Chapter 2 background is provided on automated debug-

ging techniques, max-sat algorithms, and formal verification methods. Chapter 3 presents the

abstraction and refinement based debugging methodology. Chapter 4 introduces the bounded

model debugging technique. Chapters 5 and 6 describe the max-sat debugging formulation

and the trace reduction technique, respectively. Chapter 7 discusses future work in the area of

automated debugging and it also concludes this thesis.

Chapter 2

Background

2.1 Introduction

This chapter introduces background material and concepts pertaining to the contributions of

this dissertation. In Section 2.2, verification techniques are discussed as the preceding step to

design debugging. The central theme of this thesis, automated design debugging, is formally pre-

sented in Section 2.3 and the necessary introductory concepts are visited. Section 2.4 provides

background on traditional diagnosis techniques based on simulation and BDDs. Section 2.5

discusses the fundamentals of SAT-based debugging, a powerful debugging technique used and

enhanced multiple times in this thesis. Finally, the essential theoretical and implementation

components for SAT solvers and max-sat solvers are introduced in Section 2.6.

2.2 Verification Techniques

Functional verification aims to find failures in designs or prove their functional correctness.

When failures are identified, verification tools provide error traces or counter-examples to re-

produce the conditions for the failure. This dissertation is concerned with debugging functional

failures of digital circuits at the RTL and at the gate-level. Timing errors and other specification

violations such as power consumption failures are not examined here.

Definition 1 A failure is the manifestation of at least one discrepancy at an observation point

between a design implementation and its specification model when they both operated under the

same primary input stimulus.

13

Chapter 2. Background 14

A discrepancy at an observation point is a conflicting Boolean assignment at that point

during simulation under the same input stimulus. In this context, an observation point can

be a primary output or an internal design signal with a defined expected functional behaviour.

For example, a failure occurs when a circuit produces the Boolean value 0 at a primary output

under a given input stimulus while its specification dictates the value 1 for the same signal. The

root cause (source) of the failure is commonly referred to as an error or a bug. The moment

an error source produces a value that differs from its “golden” model, the error is said to be

excited. Debugging procedures use failures in order to localize the root cause by identifying the

excitation points. Once localized, engineers can rectify the design by removing the observed

failure.

Two main types of verification processes that detect failures exist at the RTL and at the

gate-level, simulation-based and formal-based techniques. Broadly speaking, simulation-based

techniques explore the design space by providing stimulus to the design through a testbench.

They model the behaviour of the design via a simulator or a hardware emulator and they

determine the existence of failures using appropriate testbench correctness checkers or monitors.

Simulation based verification remains the predominant verification methodology in the industry

today as it is used in more than 90% of verification cases [48]. The success of simulation-based

verification is primarily due to its ease of use and its run-time performance. It is much faster

and it can handle much larger designs when compared to formal verification. Theoretically,

simulation-based techniques cannot prove or disprove the correctness of a design unless 2(I+S)

unique stimulus vectors are exercised where I is the number of primary design inputs and S

is the number of its state elements. Since simulating 2(I+S) vectors is often not practical, it is

well accepted that a simulation-based technique can only provide a high-degree of confidence

about the functional correctness of a design and not an absolute certitude.

Formal verification techniques prove or disprove the correctness of a design by contrasting its

functional behaviour to that of a golden model or property. Formal verification tools explore the

design space exhaustively by employing formal engines such as BDDs and SAT solvers [27, 57,

74, 93]. Due to the inherent limitations of formal engines, formal verification tools are restricted

in their deployment by the size of the circuit under verification. In practice, these methods


operate on small modules with 500 thousand or less gates as anything larger exceeds their

application domain. Only when structural circuit information is available (like in equivalence

checking) do these techniques scale to larger designs. As a result, formal verification techniques

are typically engaged only at the block or at the sub-system level.

Once a bug is detected, both simulation-based and formal verification techniques provide a

simulation trace or counter-example that can be used as stimulus to reproduce the failure as

defined in Definition 1. Both approaches also utilize a golden or reference model that determines

the expected or correct value when a failure occurs. An important fact in all verification

methodologies except equivalence checking [38, 63] is that the reference or golden model acts as

a “black box”. That is, this model can only be simulated to provide the correct output values

and no intermediate structural correspondence between the circuit and the reference model

exists apart from the observation points. As we shall see, this fact complicates both verification

and the subsequent debugging efforts.

As an example, consider the erroneous circuit shown in Figure 2.1 and the simulation input

trace {I1, I2, I3, I4} = {0, 1, 0, 0}. We assume that a golden model provides the expected (refer-

ence) value of 1 at observation point O1. In this case, there is a failure since the correct value 1

conflicts with the simulated value of 0. This discrepancy is depicted as pair of correct/erroneous

logic values of 1/0 in the figure. This information from simulation is captured by a verification

checker and it is used later in automated debugging.

01

0

0

A0

C

B

0

0

D

O1

I1

I2

I3

I4

1/0

Figure 2.1: A circuit with failure observed at O1.

Definition 2 A diagnosis vector v is a union of three sets. One set contains the sequence of

primary input logic values needed to simulate the failure. The second set contains the initial

state values for all state elements. The third set contains the sequence of correct (reference)

values for the observation signals.


The three sets in v are also called input vector, initial state vector, and expected or golden

output vector in test diagnosis literature [91]. In Figure 2.1, the diagnosis vector v contains the

input vector {I1, I2, I3, I4} = {0, 1, 0, 0}, the golden output vector {O1} = {1}, and an empty

initial state vector. Note, that the erroneous circuit response for v can be obtained by simply

simulating the faulty design. When multiple diagnosis vectors are available, a capital letter

V denotes the set of these vectors. In the next section, the automated debugging problem is

formally introduced. This problem formulation uses the information encapsulated in the set of

diagnosis vectors V .

2.3 Automated Debugging

Debugging is a widely used term with different interpretations for the different phases of the

VLSI design cycle. In this dissertation, we follow the terminology used predominantly in func-

tional design debugging and in fault diagnosis literature [1, 91].

Definition 3 Debugging is the process of identifying the source(s) of failures in a given erro-

neous circuit.

The sources of errors in the above definition can be a set of logic gates, logic functions,

entire design modules, or specific RTL lines (VHDL, Verilog, etc) depending on the design

model used. In this dissertation, we use the term component to indicate any of these error

candidates. Formally, the input and output of an automated debugger are defined as follows.

Definition 4 A component with a corresponding output function f is a suspect if and only

if there exists a new function g that can replace f and remove the failure for a given set of

diagnosis vectors V .

Definition 5 Given an erroneous design C with a corresponding set of diagnosis vectors V

that detect a failure, automated design debugging is a process that returns suspect components

in the design.

A set of suspects are also referred to as debugger solutions. In this thesis, for a given

component, a function f produces a sequence of Boolean values for the its output, based on


Pass

No

Yes

Verification tool(simulation)

Circuit C

Debugging tool

Error Source

Done

CorrectInitialstates values

Stimulusvalues

Vector set V

(Suspects)

Figure 2.2: Debugging in a modern simulation-based verification flow

its inputs and initial state. For components with multiple output signals, the function set F

comprises of the functions for each individual output signal.

The above definition for automated debugging, mirrors the manual debugging process per-

formed by design and verification engineers today. In detail, engineers use the verification

failure and information from the diagnosis vectors V to search for locations in the design where

a design modification such as a gate or function change can remove the failure. They do this by

manually tracing the design simulated under the input test vectors V using dedicated waveform

viewers and structural RTL editors.

Figure 2.2 shows where a manual or automated debugging tool fits within a simulation-based

verification flow. The diagnosis vectors V is captured from the verification tools as described

in Section 2.2. In general, the more vectors (i.e. the larger the cardinality |V |) returned by

verification, the better the resolution of the solutions from debugging is expected to be (i.e.

fewer solutions, more precise localization, etc). Once an automated debugger returns suspects

at the RTL or at the gate-level, the designer must devise a fix for the design at those suspect

locations to actually remove the failure for the given vectors. It should be emphasized that

once a rectification is made, the design must be thoroughly verified since the fix may result in

further functional failures.


2.3.1 Complexity of Automated Debugging

This section introduces nomenclature specific to the complexity of the automated debugging

problem. According to Definition 5 there may exist more than a single unique suspect compo-

nent whose function can be modified to rectify the failure for the given set of vectors. Every

possible suspect found by an automated debugger corresponds to a location in the RTL or

gate-level design where the failure can be fixed. For example consider the circuit in Figure 2.1,

for the single vector, one suspect can be gate D because changing it from an AND to an NAND

will remove the failure for the given input vector. Similarly, gate A is another viable suspect

because, it can be shown, changing its function from an AND gate to an OR will also remove the

failure.

The existence of more than one solutions to a debugging problem arises from the fact that

there may be more than one way to re-synthesize the design to correct it. In practice, the

designer may prefer one suspect over another in order to perform a correction. The preference

can be due to reasons such as the ease of implementation of a correcting function or due to the

effect of the correction on circuit performance such as timing, power, etc. In order to provide

many correction choices to the designer, an effective automated debugger must return all the

possible suspects. In this manner, the designer can have a larger set of options to evaluate for

suitability when devising a correction.

Automated debugging as presented in Definition 5 does not limit the number of components

that are returned as a single solution. For example in Figure 2.1 the solutions {A} and {D} are

valid solutions but so is the solution set {B, C}. In other words, simultaneously changing the

functions of gates B and C to NOR gates can rectify the problem. In practice, solutions with

fewer suspect components are preferred because fewer components must be modified in order

to fix a failure.

Definition 6 The error cardinality, denoted as N , is the number of distinct suspect components

contained in a solution set returned by an automated debugger.

It has been shown in [99] that the complexity of the debugging problem increases expo-

nentially with respect to error cardinality N . To reduce the effect of the complexity growth,


a debugger may begin with an initial guess N = 1 and increase N until N = maxN . In this

definition, maxN is a user-defined maximum number of errors the debugger must consider at

once. This approach ensures that potential error sites are searched quickly with small values

of N . Only when a suitable correction location is not found, the value of N is increased. Typ-

ically, given a value for N , most automated debuggers do not return solution sets that contain

suspects found at cardinalities less than N . For example, in Figure 2.1 the solution set {A, C} is

typically not returned with N = 2 because {A} is found with N = 1. We follow this convention

in this dissertation.

Definition 7 Given an error cardinality N , all possible solutions returned by an automated

debugger are referred to as equivalent solutions.

In Figure 2.1, with N = 1, both {A} and {D} are equivalent solutions. An automated de-

bugger is said to be incomplete if all equivalent suspects are not returned for a given cardinality.

In Figure 2.1, with maxN = 4, the complete solution set is {A, D, {B, C}} because there are

no solutions with N = 3 and N = 4.

2.4 Simulation- and BDD-based Diagnosis

Traditionally, diagnosis (debugging) techniques are classified as cause-effect or effect-cause tech-

niques [50]. Cause-effect analysis usually simulates all errors (faults) to compile error (fault)

dictionaries. These dictionaries contain entries of candidate errors (faults) and respective failing

primary output values. Given a failing design (chip) and a set of failing vectors, the design

(chip) responses are found in the dictionary to return a set of suspects for each vector. In

practice, cause-effect techniques are only effective for single stuck-at-fault models and not for

multiple faults [50]. Effect-cause analysis methods, on the other hand, are more scalable as

they use simulation and structural analysis to identify the suspects [2].

Both approaches, return sets of suspects E1, E2, . . . , Ek corresponding to diagnosis vectors

v1, v2, . . . , vk. These sets are intersected (E = E1 ∩ E2 ∩ · · · ∩ Ek) to return the final set E of

suspects that explain the faulty behaviour for all input vectors.


Historically, the first effect-cause diagnosis techniques used simulation and BDDs (symbolic

methods) to find suspects. Symbolic methods [43, 44, 58] operate by building an error equation

that encodes all corrections. Simulation-based algorithms [44, 62] typically use a backtrace

procedure to identify potential suspect locations, and perform simulation to verify that each

suspect can correct the design. For some types of suspects with high fanout, the amount of

simulation required can be excessive [62]. Since the solution space increases exponentially with

the number of suspects, incremental methods have been proposed to explore this search space

efficiently [46, 62]. Such methods examine one location at a time and rely on heuristics to find

the locations, however, they are limited in their effectiveness for industrial problems.

2.5 SAT-Based Debugging

More recently, a novel effect-cause debugging technique was proposed based on the concept of

Boolean Satisfiability [92]. In this context, the debugging problem is formulated as a Boolean

Satisfiability instance where a conventional SAT solver can be utilized to return solutions cor-

responding to suspects. In recent years, a variety of SAT-based debugging formulations have

been proposed building on the initial work of [92]. These more advanced formulations extend

the solution from the combinational problem to the sequential one, they enhance it for hier-

archical design structures and they introduce memory and performance improvements [3, 35].

Experiments show that SAT-based debugging techniques can outperform traditional debugging

techniques (Section 2.4) sometimes by orders of magnitude [33, 66, 91]

We now present the basic formulation of SAT-based debugging since it is relevant to the

contributions of this dissertation. The proposed methodologies introduced in this dissertation

apply to other debugging techniques as well from Section 2.4. However, for ease of presentation,

SAT-based debugging is often used in examples. In the following discussion, we use SAT solvers

as black box engines that return solutions to problems in Conjunctive Normal Form (CNF).

Section 2.6 provides a background of Boolean satisfiability and SAT solvers.

Almost all SAT-based debugging methods, start by synthesizing the design under verification

(erroneous circuit) at the RTL or gate-level into Boolean primitives modeling combinatorial logic


and state elements. The following four steps are then applied to create and solve the debugging

problem.

1. Add extra logic to the erroneous circuit C to model potential suspects and represent the

error cardinality.

2. Convert the amended circuit into Conjunctive Normal Form (CNF).

3. Replicate and constrain the CNF for every failing vector sequence in v ∈ V and for every

clock cycle in each sequence in v.

4. The final CNF is given to any SAT solver. Solutions returned by this solver indicate

respective suspects in C.

The transformation and amendment to the circuit in Step 1 is very important to the debug-

ging formulation and deserves further clarification. In this step, a correction model, represented

by a multiplexer mi, is added at the output of every gate (or module) li in the circuit C. The

output of the correction model mi is connected to the original fanout of li. In effect, when

the select line si of a multiplexer is inactive (si = 0), the original gate li is connected to mi,

otherwise (when si = 1) a new unconstrained primary input wi is introduced. Eventually, the

CNF variable corresponding to wi will assume the value that corrects the circuit. Figure 2.3

(a) shows a sample circuit while Figure 2.3 (b) illustrates this circuit with correction models

added for every gate and input. Recall that correction models are only added to the output of

components, which can be individual gates, groups of gates, or even high level Verilog modules.

When the problem is converted to CNF in Step 2 and the CNF is provided to a SAT solver

in Step 4, the SAT solver can assign any value {0,1} to the si and wi variables such that the

CNF satisfies the constraints applied by each vector v. To constrain the SAT problem to a

particular error cardinality N , further logic is added to activate at most N select lines. This

logic can be modeled using a counter or sorter network [74]. Thus for N = 1, a single si can be

set at a time to a logic 1 which in turn indicates that li is an error suspect. For higher values of

N , the respective number of multiplexer select lines can be set to a logic 1, a fact that indicates

an error suspect N -tuple. A simple all-solution SAT methodology can be implemented on top


x4

x3x2 l1

l3

l2

2y

y1

x1

0

1

0

1w1

s1m1

l3

l22s

m2 y1

2y

w2l1

0

1

m33s

w3

0

1

0

1

0

1

0

1

x4

x2

x3

4s

w4

m4

m55s

56s

w67s

w7m7

m6w

1x

(a) (b)

Figure 2.3: Circuit before and after adding correction models

to return all equivalent error suspects, as described in [91]. Intuitively, an all-solution SAT

solver does not return one satisfying assignment for the input problem but all assignments that

satisfy it. Iterations of all-solution problems can be formulated from N = 1 to N = maxN to

locate all possible errors with at most cardinality maxN .

Step 2, which translates the problem into CNF is not specific to the debugging problem and

described in detail Section 2.6.1. Similarly basic concepts on SAT solvers used in Step 4 of the

debugging algorithm are also illustrated in Section 2.6.2 that follows.

2.6 Boolean Satisfiability

Boolean Satisfiability (SAT) solvers are decision engines used in a variety of scientific domains

such as artificial intelligence, CAD for VLSI, finance and biology, among others. This sec-

tion introduces the satisfiability problem formally and proceeds to provide an overview of the

algorithms utilized by modern SAT solvers.

Definition 8 Given a Boolean formula Φ with operators · (and), + (or), − (negation), →

(implication), ↔ (if-and-only-if) and n variables v1, v2, ..., vn, the Boolean satisfiability problem

is to determine whether there exists a variable assignment from the Boolean domain {0, 1} to

each variable such that Φ evaluates to 1. If such an assignment exists, then the problem is

SATISFIABLE otherwise the problem is UNSATISFIABLE.


For example, determining whether the following Boolean formula evaluates to 1 is a satisfi-

ability problem:

(a + b) + ((a → c) · (b + c))

In this case, the problem is satisfiable and one suitable logic assignment to the variables is

a = 1, b = 0, c = 0. It can be seen that this assignment makes the formula satisfiable.

It is beneficial to briefly discuss Boolean satisfiability in the context of complexity theory.

Boolean satisfiability holds a special place in the history of mathematics as it is the first problem

to be classified as NP-Complete [25]. NP-Complete problems are NP-hard by definition [26] and

their solutions can be verified in polynomial time (they belong to the class NP). A problem is

NP-hard if all other problems in NP can be polynomially reduced to it. Both NP-Complete and

NP-hard problems are classes of problems for which no polynomial-time solving algorithm is

currently known. Unfortunately, many VLSI CAD problems such automated debugging, circuit

optimization and model checking all fall under either of these two classes [88].

One interesting aspect of NP-Complete problems is that if an efficient algorithm is found to

solve one particular NP-Complete problem, then all other such problems can be solved using

the same algorithm. Since the SAT problem is itself NP-Complete, an efficient SAT solver

can be used to solve other problems that can be formulated as a SAT problem. With recent

improvements in SAT algorithms, this approach has become very common. For example, SAT

solvers are used as the underlying engines to return solutions to many VLSI CAD problems

such ATPG [56], formal verification [9], low-power [67] and FPGA routing [75], among others.

In other words, instead of dedicated algorithms to solve all these VLSI CAD problems, they

can be translated to SAT instances where a generic solver can provide answers efficiently.

Although empirically effective for many problems, the SAT problem remains in theory NP-

Complete and thus has a worst-case exponential time complexity unless P=NP. In this case,

complexity class P is the set of all problems that can be solved in polynomial time. Efficient

heuristics are required to reduce the run-times and memory requirements of these algorithms

in practice; however, these heuristics are not beneficial for all problems and do not affect the

worst case behaviour.


2.6.1 CNF Representation for Boolean SAT

The SAT problem, as defined in the previous section, allows for a fairly flexible representation

of the Boolean formula Φ. However, most SAT algorithms work on a more simplified format

namely the Conjunctive Normal Form (CNF). The CNF representation of a SAT instance is a

conjunction (·) of disjunctions (+). In more detail, a CNF formula is composed of a conjunction

of clauses. A clause is composed of a disjunction of literals. A literal represents the positive

phase or negative (−) phase (inversion) of a variable. The CNF representation is popular among

SAT algorithms because they can focus on satisfying each and every clause (evaluate to 1) to

satisfy the overall problem.

When writing a formula in CNF, literals in a clause are grouped together in parentheses

and the conjunction operator is usually omitted. For instance, the following CNF formula Φ

contains two clauses, three variables (a, b, c), and five literals (a, a, b, b, c):

Φ = (a + b + c)(a + b)

Many CAD SAT problems are derived from their gate-level circuit representation and ad-

ditional problem constraints. The circuit component can be represented in CNF by using a

linear-time procedure such as the one described below [56, 79, 97].

step 1. Given a circuit, uniquely label every circuit line including all inputs

and all outputs.

step 2. For each gate, retrieve the corresponding CNF clauses representing

the gate from a database such as Table 2.1.

step 3. For each clause, replace the gate’s input and output variables in the

CNF by the appropriate unique labels.

step 4. Join all clauses by using the conjunction operation.

Intuitively, the above process replaces every simple gate in the circuit with its CNF equiv-

alent as shown in Table 2.1. In the final step, the conjunction of all the gate CNFs result in

the overall circuit CNF. Figure 2.4 illustrates a simple gate-level circuit and its corresponding

CNF representation.


Gates CNF

y = AND(x1, x2, ..., xn) (x1 + y) · (x2 + y) · ... · (xn + y)·

(x1 + x2 + ... + xn + y)

y = NAND(x1, x2, ..., xn) (x1 + y) · (x2 + y) · ... · (xn + y)·

(x1 + x2 + ... + xn + y)

y = OR(x1, x2, ..., xn) (x1 + y) · (x2 + y) · ... · (xn + y)·

(x1 + x2 + ... + xn + y)

y = NOR(x1, x2, ..., xn) (x1 + y) · (x2 + y) · ... · (xn + y)·

(x1 + x2 + ... + xn + y)

y = XOR(x1, x2) (x1 + x2 + y) · (x1 + x2 + y)·

(x1 + x2 + y) · (x1 + x2 + y)

y = XNOR(x1, x2) (x1 + x2 + y) · (x1 + x2 + y)·

(x1 + x2 + y) · (x1 + x2 + y)

y = NOT (i) (i + y) · (i + y)

y = BUFFER(i) (i + y) · (i + y)

y = MUX(s, x1, x2) (x1 + y + s) · (x1 + y + s)·

(x2 + y + s) · (x2 + y + s)

Table 2.1: Simple gates and their CNF representation

The above procedures translate a gate-level circuit to CNF. However, often more work is

required to formulate the specific CAD problem. For instance, to constrain any variable a to

Boolean values 1 or 0, the unit literal clause (a) or (a) is used, respectively. As another example,

the procedure described in Section 2.5 formulates a SAT-based debugging problem. In fact,

in that process, the circuit is constrained with the logic values encapsulated in the diagnosis

vector set V using unit literal clauses.

2.6.2 Boolean Satisfiability Algorithms

Modern SAT solvers such as MiniSAT [74], zChaff [73], and Grasp [69] are based on the branch-

and-bound theory of the original DPLL SAT solving algorithm [27]. The major functions of the


d eabc

Φ =

OR gate︷︸︸︷

(a + d) · (b + d) · (a + b + d) ·

NAND gate︷︸︸︷

(c + e) · (d + e) · (c + d + e)

Figure 2.4: Example: circuit and CNF representation

DPLL procedure are shown in the algorithm of Figure 2.5. First, the decide function picks a

variable from the CNF problem and assigns it a 0 or 1 value. Function deduce determines what

other assignment are implied based on the previous decision and identifies whether a conflict

has occurred. A conflict occurs when a variable is implied two opposing Boolean values due to

different CNF clauses. If a conflict occurs, the solver must backtrack to a decision level prior

to the conflict. If a conflict is reached at design level 0, then the problem is UNSATISFIABLE,

otherwise, a SATISFIABLE variable assignment is found as the solution to the SAT problem.

Today’s most effective SAT solvers use advanced techniques such as efficient data struc-

tures for Boolean Constraint Propagation and book-keeping, non-chronological backtracking,

conflict based decision making, and solver restarts to explore different portions of the solution

space [69, 73, 74]. These techniques help the SAT solver performance but cannot guarantee a

timely completion of the search since the problem is theoretically NP-Complete. Nevertheless,

extensive empirical results by the research and industrial communities show that SAT solvers

are found to be computationally viable decision engines for many CAD problems [8, 39, 56].

2.6.3 Maximum Satisfiability

A close relative to the SAT problem is the Maximum Satisfiability (max-sat) problem. Max-sat

is an optimization problem that seeks to find an assignment to an UNSATISFIABLE CNF

formula that maximizes the number of satisfied clauses [61, 68]. For example, for the CNF

Φ = (a) · (a) · (b) · (a + b)

one max-sat solution is {(a), (b), (a + b)}.


1: while ( decide() ) do

2: if ( deduce() = conflict ) then

3: blevel = analyze conflict()

4: if ( blevel = 0 ) then

5: return UNSATISFIABLE

6: end if

7: backtrack(blevel)

8: end if

9: end while

10: return SATISFIABLE

Figure 2.5: Basic DPLL SAT solving algorithm

While max-sat is concerned with finding a satisfiable set of clauses with maximum cardinal-

ity, this can be generalized to find Maximal Satisfiable Subsets (MSSes). An MSS is a satisfiable

subset of the formula clauses that is maximal in the sense that adding any one of the remaining

clauses it will make it unsatisfiable. Any max-sat solution is of course an MSS, but MSSes can

be different (smaller) sizes as well. For instance in the above example, the following are MSSes.

{(a), (b), (a + b)}

{(a), (b)}

{(a), (a + b)}

The first set is an MSS because with assignments a = 0, b = 0 the problem is satisfiable,

the second and third sets are both satisfiable with assignments a = 1, b = 0.

In this dissertation, the complements of MSSes, sets of clauses whose removal makes the

instance satisfiable, are of interest. Just as an MSS is maximal, its complement is minimal, and

we refer to such a set as a Minimal Correction Set (MCS). For the above sets the corresponding

MCS are {(a)}, {(a), (a+b)}, {(a), (b)}, respectively. Chapter 5 presents a technique that solves

the debugging problem by formulating it as a max-sat problem and seeking MCS sets.

There are many types of max-sat solvers developed by research groups today. Unlike DPLL

algorithms for SAT solvers, it is still not clear which max-sat algorithm is the most effective.

Some max-sat solvers simply build a SAT instance with cardinality constraints and use an


off-the-shelf SAT solver to find solutions to the problem [41]. Others use an approximate SAT

algorithm such as local search [42] to find an initial solution to later refine it [4]. One of the

most effective solvers for debugging problems analyzes unsatisfiable cores directly from the SAT

instance and refines the solution by operating on the unsatisfiable core [68].

Chapter 3

Debugging using Abstraction and

Refinement

3.1 Introduction

One of the main challenges in automated debugging is scaling the algorithms to large industrial

designs. Most state-of-the-art research on this topic present results on benchmark designs with

at most tens of thousands of gates. In contrast, modern design blocks in the industry exceed

the half a million gate mark. Since the complexity of the debugging problem is O(lN ) [33],

where l is the number of gates in the design and N is the error cardinality, large designs can

have a significant impact on debugger run-time performance and memory requirements. One

can claim, this challenge has indeed limited the application of automated debugging techniques

to real industrial designs. This chapter introduces abstraction and refinement techniques for

automated debugging to overcome their current design size limitations. The methodology

proposed in this dissertation allows existing debugging tools to tackle larger designs than ever

possible while providing considerable performance improvements.

For over ten years, abstraction and refinement techniques introduced in formal verification

have dramatically influenced the scalability and applicability of model checking tools [11, 22,

23]. Viewed as one of the major advancements in model checking, countless abstraction and

refinement contributions have been made with significant impact on the industrial verification

29

Chapter 3. Debugging using Abstraction and Refinement 30

community [11, 23, 49, 65]. Essentially, these techniques approximate the model checking

problem in a systematic manner thus allowing conventional model checkers to handle larger and

harder problems than ever possible. Similarly, this chapter establishes a theoretical framework

for debugging via abstraction and refinement and presents evidence attesting to the significant

performance and capacity improvements made available to automated debuggers.

The proposed abstraction-based debugging technique begins by “simplifying” components of

the design according the design structure and a pre-determined abstraction level. Hierarchical

or RTL designs are good candidates for function abstraction where high-level functions are

simplified. Gate-level designs benefit from state abstraction which operates on a flat netlist.

Irrespective of the design structure, a high abstraction level leads to an aggressive algorithm

which can drastically reduce the debugging problem size. However, a small problem size is not

always advantageous since there exists a trade off between the level of abstraction performed

and the number of algorithm iterations required to solve the problem.

Once a design is abstracted, a debugging problem can be formulated and solved using

conventional debugging techniques. The error locations returned, also called error suspects,

can contain valuable information to help the user rectify the design. Furthermore, the error

suspects can also inform us that the solutions may not be complete, or that more error suspects

may exist in the design. This scenario occurs when abstraction aggressively removes part of the

design that may contain error sources. Subsequently, refinement is applied, where components

are selectively re-introduced into the abstracted circuit. In essence, refinement systematically

enriches the abstracted circuit with components until all the error sources are located.

This pairing sequence of debugging and refinement steps is iterated until all solutions are

found. A set of rigorous theorems are presented that guarantee the correctness and completeness

of the proposed methodology. It should be noted, that the proposed methodology is not tied to

any particular debugging technique. Although in this chapter the presentation is conveniently

outlined in terms of SAT-based debugging, other diagnosis methodologies (simulation- and

BDD-based) can utilize the presented theory to benefit as well.

In further detail, this chapter introduces a novel debugging abstraction/refinement frame-

work with two orthogonal abstraction techniques each focusing on different design structures.


1. State abstraction: abstraction is performed on state and memory elements to reduce

the spatial and temporal size of the problem [86]. State abstraction operates on a flat

gate-level representation of the problem and does not require any hierarchical or module

information.

2. Function abstraction: abstraction is performed on functions and modules in an iterative

manner through the design hierarchy thus reducing the size of the problem. Function

abstraction leverages information embedded in high level or RTL designs to provide sig-

nificant benefits.

Extensive experiments on large industrial problems demonstrate a memory and run-time

reduction of 60% and 4.5× using state abstraction, respectively. With function abstraction

drastic memory reduction of over 27× and run-time reductions of over two orders of magnitude

are observed. Just like in verification, this chapter demonstrates that abstraction and refinement

have a critical impact on the performance of automated debugging. They also motivate future

research in the field.

This presentation of material is organized as follows. In the next section, notation and

background material are presented. Section 3.3 present the general abstraction and refinement

methodology and guarantees its correctness and completeness. Sections 3.4 and 3.5 present the

details of state-based and function-based abstraction techniques, respectively. Empirical results

are presented in Section 3.6 and conclusion and future work are discussed in Section 3.7.

3.2 Preliminaries

A design or circuit C can be represented at the gate-level or the Register Transfer Level (RTL).

At the RTL level, the circuit is often hierarchically composed of modules or functions. In

this chapter, a function is said to generate a Boolean value for a variable y based on m input

variables x1, x2, ..., xm and zero or more state variables. For abstraction and refinement, we

are primarily concerned with the structural connectivity between the input variables and the

variable y of a function. As a result, the dependence of the function on state variables is omitted

in the following. The terms modules, components and functions are used interchangeably to


refer to entities implementing functions as defined above. For example, a Verilog function or

a collection of RTL statements can define a module. Each module implements a multi-output

function F = {f1(X), f2(X), ..., fp(X)} where each single-output function fi is defined on input

variables X = {x1, x2, ..., xq}. In the remaining, single output functions and multi-output

functions are not distinguished unless explicitly stated otherwise.

Modules can also contain sub-modules thus resulting in a hierarchy tree H for the design. A

hierarchy tree H contains nodes representing modules and edges representing parent and child

(sub-module) relationships. The hierarchy tree H can contain many levels, thus each function

is labelled with a superscript that indicates its level. For example a function F ij is at level i of

the tree and it can have sub-functions F i+1k and F i+1

l at the next level i+1. The output of the

entire design C is represented by F 01 at root level 0. This module and hierarchy terminology is

used extensively in Section 3.5 when introducing function abstraction.

3.2.1 Abstraction and Refinement in Model Checking

Abstraction and refinement techniques are used readily in model checking to mitigate the expo-

nential nature of the underlying state space [11, 22, 23]. Roughly speaking, an abstract model is

derived by removing state elements or other components from the original concrete design. As

an active area of research, many different types of abstraction techniques exist such as existen-

tial abstraction and predicate abstraction [24, 40, 49]. Irrespective of the abstraction approach,

the final abstract model contains fewer circuit elements than the original thus simplifying the

task of the model checker.

Depending on the properties being verified and the abstraction technique used, the model

checking result may or may not be trusted. For example, consider the scenario when verifying

a universal property (whether the property holds for all paths) using an existential abstraction

technique [19]. If model checking determines that a property holds in the abstract model, then

it must also hold in the concrete design [19]. However, if a property does not hold in the abstract

model, then the corresponding counter-example must be validated in the concrete design. If

the counter-example does not expose a failure of the property in the concrete design it is said to

be spurious [24]. In this case, the abstract model is refined by reverting some of the abstracted


state elements and continuing the model checking process.

3.3 Debugging with Abstraction and Refinement

The aim of abstraction-based debugging is to reduce the size and complexity of the underlying

problem. Since the performance of a debugger and its memory requirements are directly re-

lated to the size of the circuit under analysis, abstraction can introduce considerable run-time

and memory benefits. This section introduces the basics of a complete and sound debugging

methodology using abstraction and refinement.

An abstract model C ′ is derived by removing a set of components or functions Abs from a

concrete model C. More precisely, as shown below, the procedure selAbsComponents selects a

set of functions to abstract while the procedure absDesign removes these from the design C:

Abs = selAbsComponents(C)

C ′ = absDesign(Abs, C)

When the components Abs are removed, some of the circuitry in their transitive fanin

may be left dangling (i.e. unused by any other logic). An iterative dangling logic removal

procedure can eliminate all gates, wires and other state elements unused by the abstracted

components [13]. The resulting model C ′ can be significantly smaller than C. For instance, if

Abs includes all primary outputs of a circuit, the entire circuit can be essentially removed. The

degree of abstraction to perform is addressed through the experiments of Section 5.6.

After removing the components Abs, their direct fanouts, which are now undriven, are

connected to newly introduced primary inputs. Specifically, for every function fi ∈ Abs a new

primary input is introduced in C ′ and connected to the fanout of fi. As an example, consider

Figure 3.1 (a) and (b) where a circuit is shown before and after abstraction, respectively. In

this case, the component to abstract is Abs = {q1}. Notice that q1 and its transitive fanin

logic, l6 and x3 are removed in Figure 3.1 (b) and the fanout of q1, l2 is now driven by the new

primary input x5.


2x3x4

l4

l3

l2

l61x l1

2q

q

x

1

l5

2

1

y

y

Q

DQ

D

l2

l5

x

l4

l2

1

x4

x5

x

2q

2y

2

3l

1l

1y

Q D

(a) (b)

Figure 3.1: Circuit before and after abstracting flip-flop q1

3.3.1 Guaranteeing Correctness

Once the abstract model C ′ is generated, the next step is to construct the debugging problem.

The natural reaction is to formulate a SAT-based debugging problem according to Section 2.5

with the abstract model C ′ and the error trace V . However, the abstract model C ′ contains

the newly added primary inputs, which remain unconstrained in V . As a result, a SAT-based

debugging engine may arbitrary assign unjustifiable logic values to these variables while solving

the debugging problem.

Definition 9 Assume that Φ (Φ′) corresponds to a SAT-based debugging problem derived from a

concrete design C (abstract design C ′). A value assignment to the variables of Φ′ is unjustifiable

if the same assignment to Φ gives a conflict.

Here, a conflict occurs when different values from the Boolean domain are assigned to the

same variable. Consequently the solutions returned by a debugger in this formulation cannot

be trusted as it may be incorrect. The following example illustrates this particular scenario.

Example 1 Figure 3.2 (a) shows a concrete design with an error on gate l1. Regardless of

the error type, the correct/erroneous values of logic 1/0, shown in bold, propagate from gate


3l

2l

4l

4x3x2x

1y

y2

5l

1x1 l6

l

2q

q1

D

Q D

Q

0/0

DQ

1

DQ

1

00/0

1

1/01/0

1/01/0

5l

2l

3l

1l

1y

y

q2

1

4l

x

2x4

x5

x2

1/11

00/0

0/0

11/0

1

DQ DQ

(a) (b)

Figure 3.2: Demonstrating the effect of unconstrained inputs on abstract circuit

l1 through the flip-flop q1 and to the primary output y1. Notice that the primary input values

remain constant in both time frames. When the state element q1 is abstracted and left un-

constrained, the SAT solver can assign this new input x5 to a value 1 which will produce the

correct/erroneous value pair 1/1, also shown in Figure 3.2 (b). Here, the value assignment

of x5 = 1 is unjustifiable because in the concrete design of Figure 3.2 (a) the corresponding

assignment to q1 is 0.

One way to prohibit unjustifiable solutions from occurring is to constrain the newly added

primary inputs to the values of the Abs components in design C as proposed by Theorem 1.

Theorem 1 Given a circuit C and an input vector sequence from v, the set Q contains the

simulation values of the output of components Abs for all clock cycles in v. A debugging problem

formulated with abstract model C ′ and v′ = v ∪ Q will not have any unjustifiable assignments.

Proof: This proof is based on the fact that the abstract model can be restricted sequentially to

behave like the concrete model. For every clock cycle, the fanout logic of every Abs component

in C is driven by circuit elements whose Boolean values are stored Q. Similarly, the Boolean

values in Q are used to drive the new primary inputs in C ′ for every clock cycle. Since the


fanout logic of every Abs component in C ′ is constrained to the same values as in C, unjustifiable

assignments will not occur.

By Theorem 1, correctness is guaranteed since all solutions for the abstract model are also

solutions for the concrete design. Next, the proposed methodology is extended to find suspects

that may be accidentally abstracted.

3.3.2 Spurious Solutions

Since abstraction may remove large sections of a design, it is possible that error sources are

accidentally removed. This case will be identified as automated debuggers will return the new

primary input suspects as spurious solutions.

Definition 10 Spurious solutions are primary input suspects returned by an automated debug-

ger that correspond to abstracted components.

Spurious solutions do not provide enough information about the error source to help rectify

the erroneous concrete design. In other words, these spurious solutions mask equivalent error

locations. To find the error locations in the concrete design, the abstracted variables and their

respective removed fanin logic must be analyzed. One way is to refine the design based on the

spurious solutions and iterate the debugging process. Refinement is achieved by re-introducing

the original circuitry, including the removed fanin logic, corresponding to the spurious solutions

into the design C ′.

Example 2 Consider the circuit in Figure 3.2 (a) after abstracting q1, where a debugger finds

l2, l1, and x5 as suspects with N = 1. Here, location l6, which is the error source in the

concrete design, is abstracted. In this case, the spurious solution x5 masks the error source l6.

Refinement is necessary to re-introduce l6 into C ′, thus allowing the debugger to find l6 in the

next iteration.

Traditionally, complete solutions that return all equivalent solutions are important in de-

bugging [91] since they offer more degrees of flexibility for the designer to correct the design or

optimize it, if a debug-based rewiring algorithm is used [98]. To find all equivalent suspects,


2l

5

5x

4l

2l

l

1/01/01

3l

12x

5

1 6

x

4x

x1l

l

1/0

1/0

5l

3l

1

1y

l

1/0

1

l4

1

2y

1

Figure 3.3: Abstract circuit unfolded over two time frames

all solutions corresponding to abstracted components must be refined, a process performed it-

eratively until no more solutions from abstracted components are found. In practice, since the

proposed process is incremental, the user at any time can attempt to rectify the circuit before

the entire debugging process is complete.

3.3.3 Guaranteeing Completeness

The abstraction formulation and refinement schemes discussed in the previous sections provide

a means of identifying error sources without considering the entire design. However, under

certain conditions some equivalent solutions may be missed by the debuggers. This happens

when a set of m errors in the concrete design are mapped onto a set of n errors in the abstract

model, where n > m, as shown in the Example 3.

Example 3 Consider the abstract circuit in Figure 3.1 (b) unfolded for two time frames as

illustrated in Figure 3.3. For clarity, the abstracted logic l6 is shown in dashed lines. Notice

that the error from gate l1 does not directly propagate to output y1 but its effect is captured in

abstract variable x5. For error cardinality N = 1, the SAT solver returns the single equivalent

error location l2. Assuming that the design is analyzed and it is concluded that l2 is not the

error source, the real source of error goes undetected. However, if N is incremented to 2, then

the pair {l1, x5} is found as a solution. By refining the abstract variable x5 to q1 and solving

the debugging problem again with N = 1, the single error location l1 is found.


The above example illustrates how abstraction can cause the error location to be found with

higher error cardinality. Given a maximum user defined error cardinality of maxN , when using

abstraction and refinement, the maximum cardinality should be set to maxNabs = maxN +

|output(Abs)|, where |output(Abs)| is the number of unique outputs for the abstracted functions

(or the number of new primary inputs). Theorem 2, presents steps required to find all equivalent

error locations for a user-specified value of maxN .

Theorem 2 Assume that a debugger returns solution set S for concrete design C, diagnosis

vectors V , and maximum error cardinality maxN . The debugging procedure that performs the

following steps with an abstract model C ′, diagnosis vectors V ′, and maximum error cardinality

maxNabs = maxN + |output(Abs)| finds set of solutions S′ ⊇ S.

1. Initialize N to 1

2. Debug C ′ with V ′ and N to get solution set S′.

3. If any solutions s ∈ S′ are spurious, refine the abstract model C ′ using s, go to (1)

4. Increment N by 1

5. If N > maxNabs return S′, else go to (1)

Proof: In the worst case, some error sources are abstracted and their behavior is captured

by the new primary inputs or output(Abs). Together, the maximum number of active error

locations is maxNabs = maxN + |output(Abs)|. The debugger proceeds to find solutions based

on the abstract model using N ≤ maxNabs. If any of the solutions are spurious, then the

abstract model is refined and those variables are replaced with their corresponding concrete

components. The new abstract model is then given to the tool which starts the search with

N = 1 again. The search continues until N = maxNabs, and all the equivalent errors that

map into maxNabs-tuples or fewer will be found. After every refinement step, some abstracted

components are re-introduced and previous solutions at N = maxNabs may be found at N ≤

maxNabs. This process guarantees that all the abstracted components that mask error locations

are systematically resolved thus finding all the solutions S.


1: S = ∅, N = 12: Abs = selAbsComponents(C)3: C′ = absDesign(Abs,C)4: maxNabs = maxN + |outputs(Abs)|5: while (1) do

6: V ′= extract constraint(C, C′, V )7: New sols = debug(C′, V ′, N)8: for all Sol ∈ New sols do

9: if (spurious solutions(Sol, C′)) then

10: C′ =refine(Sol, C′, C)11: N = 012: maxNabs = maxN + |outputs(C′, C)|13: else

14: S = S ∪ Sol

15: end if

16: end for

17: N = N + 118: if (N > maxNabs) then

19: return {S, C′}20: end if

21: end while

Figure 3.4: Debugging algorithm with state abstraction and refinement

3.3.4 Overall Algorithm

Figure 3.4 illustrates the overall abstraction and refinement algorithm for a debugging method-

ology that guarantees correctness and completeness. The first step is to generate the initial

abstract model C ′ as shown on lines 2 and 3. To ensure correctness, on line 6, the stimu-

lus is modified to constrain the new primary inputs to their simulation values as discussed in

Section 3.3.1. The modified diagnosis vector V ′ and abstract model C ′ are provided to the de-

bugger to find the error locations as shown on line 7. Next, according to the spurious solutions,

refinement may be performed, the error cardinality is reset, and maxNabs is recalculated. If

the solutions are not spurious, they are added to the solutions set S to be returned to the user.

The above steps are repeated until maxNabs is reached for completeness.

Even though the final solution S is returned on line 19, the algorithm is incremental in nature

meaning that every solution found can be provided to the user. The benefit of an incremental

algorithm is that suspects can be analyzed by engineers prior to all equivalent solutions being

found.


3.4 State Abstraction

State abstraction is one type of abstraction where memory elements such flip-flops and latches

are selected for removal. This approach can be powerful because state elements are important

components that play a central role in both state machine and datapath logic. Furthermore,

when modular or hierarchical information is not available for a design, as is the case for post-

synthesis netlist or custom logic, state abstraction can operate on the flat design.

The effectiveness of state abstraction is demonstrated empirically in Section 3.6.1. A subtle

benefit of state abstraction is that with a reduced state space, debug traces can be considerably

shortened. This advantage is discussed next.

3.4.1 Trace Length Reduction Benefits

Long error trace lengths are commonly associated with simulation-based verification tools where

random and constrained-random stimulus is used to exercise the design. Both manual and

automated debugging can benefit from operating on shorter error traces. Trace reduction is an

effective pre-process to debugging as it can reduce trace lengths by orders of magnitude [17, 20,

78].

State abstraction can help further reduce the trace length prior to debugging. With many

of the state elements abstracted, the state space of the design is reduced thus allowing for

state matching techniques to remove repeated states and redundant transitions [17, 20, 78].

It should be emphasized that most state matching techniques implicitly re-simulate reduced

traces in order to ensure that the desired failure is still exposed.

As an example, consider Figure 3.5 where a state transition diagram is used to illustrate an

error trace from state q0 to qk. In the original trace, no trace reductions are possible through

state matching. However, after the second state element is removed (through abstraction) the

q

1001

1-01 1-10

1010 0010

0-10

qk

1100

1-00

4q

3q

2q

10110

0-100-00

00000

q

Figure 3.5: Reduced trace V ′ due to abstraction


states q1 and q4 can no longer be differentiated. The state values after abstraction are shown

under each node in Figure 3.5. As a result, a short-cut can be taken in the trace from state q0 to

state q4 as illustrated by the dashed line. Note, as required by most trace reduction techniques,

the compacted traces must be tested to determine whether the error(s) are still observable.

3.5 Function Abstraction

When a design contains high level or RTL information, function abstraction can provide a

natural and powerful way to partition the debugging problem. For instance, designers working

on RTL designs, use modules to partition the design based on functionality and complexity.

These modules are also good candidates for abstraction. Furthermore, the hierarchical and

modular composition of HDL designs can be leveraged to apply abstraction and refinement in

a systematic manner.

3.5.1 Hierarchical abstraction

The strength of function or modular abstraction can be amplified when used in a hierarchical

manner. More specifically, module-based debugging can be applied iteratively at each hierarchy

level thus allowing for a divide and conquer debugging approach.

At each level i of the hierarchy H, the functions {F i1, F

i2, ..., F

ip} can be considered by

the procedure selAbsComponents to select the components to abstract, Absi. The iterative

sequence of abstraction, debugging and refinement presented in algorithm of Figure 3.4 can be

applied to the problem constructed at hierarchy level i. However, only functions at level i can

be refined and not their sub-functions. In order to locate the errors in the sub-functions, the

entire algorithm must be repeated at the hierarchy level i + 1.

Two properties of hierarchical abstraction and refinement are very important. After com-

pleting an iteration of the algorithm in Figure 3.4 at hierarchical level i, then,

1. if a function f i is still abstracted, then its sub-functions gj can be abstracted at hierarchy

levels > i.

2. if a function f i is refined, then its sub-functions at gj may still be abstracted at hierarchy

levels > i.


X1

X2

Y1

Y2

F 11

BugF 2

2

F 23

Figure 3.6: Function F 11 is composed of functions F 2

2 and F 23

The first observation is easy to confirm. When a function is still abstracted after debugging,

it signifies that equivalent error locations do not reside inside it. Similarly, the sub-functions

will not contain any equivalent error locations either, and they should be abstracted at deeper

hierarchy levels.

For the second observation, consider Figure 3.6 where an error resides in F 22 . At level 1,

function F 11 cannot be abstracted since it contains the error. However, at level 2 sub-function

F 23 may be abstracted since it is independent from F 2

2 and its output. Thus, functions can

be partitioned into sub-functions such that some of the sub-functions will not contain any

equivalent error locations.


To reduce the debugging problem size further, when operating at a given hierarchy level i,

all functions at a deeper hierarchy level > i should also be abstracted in C ′. However, it is

important to only refine modules at level i. This restriction reduces the complexity of the

debugging problem at level i and postpones the analysis of the sub-functions at level > i to

future hierarchy levels.

The process of finding all equivalent solutions through the management of the error cardi-

nality is the same with hierarchical abstraction from Section 3.3, also shown below.

Example 4 Consider Figure 3.7(a) where the modules Abs1 = {F 12 , F 1

4 } are abstracted at level

1. The abstraction results in the removal of modules F 11 and F 1

3 as well because they fan-in

to Abs1. The initial abstracted circuit is shown in Figure 3.7(b). Assuming that the error is


F 27

F 13 Bug

F 25

F 11

Y1

X2

X1

Y2

F 26

F 14

F 12

(a) model C before abstraction

Y1

Y2

XF 12

XF 14

(b) initial model C′ at level 1

Y1

X2

XF 11

Y2

F 14

F 12

F 13 Bug

(c) final model C′ at level 1

Y1

X2

XF 11

Y2

F 14

F 12

F 27

F 13 Bug

(d) final model C′ at level 2

Figure 3.7: Hierarchical abstraction and refinement example

in module F 27 , the error effect can propagate to the output of Y1 and Y2. In this example, the

debugger will not identify a single error source, but will find the error pair of {XF 1

2

, XF 1

4

} with

N = 2. Through refinement, these modules and their fanin circuitry are re-introduced in the

circuit as shown in Figure 3.7(c). Next, the error cardinality N must be reset to 1. At hierarchy

level 2, the modules F 27 and F 2

6 can be abstracted as part of Abs2. Refinement will re-introduce

module F 27 and debugging will find the error source inside it as shown in Figure 3.7(d).

The proposed hierarchical abstraction and refinement algorithm is shown in Figure 3.8. Here

the debugging problem is solved iteratively by descending the hierarchy H. At each hierarchy

level i, the procedure absDesign first abstracts all functions at levels > i. This ensures that

sub-functions will not be refined. Next, function abstraction and refinement is performed by

Function debug according to the algorithm of Figure 3.4. The effectiveness of the proposed

technique is demonstrated in the experiments of Section 5.6.


1: Solutions = ∅, level = 0, N = 1, C′ = C

2: while (1) do

3: level = level + 14: C′ = absDesign(level + 1, C′)5: {New sols, C′} = Function debug(C′, level, N)6: if New sols = ∅ then

7: return Solutions

8: else

9: Solutions = Solutions ∪ New sols

10: end if

11: end while

Figure 3.8: Hierarchical debugging algorithm

3.6 Experiments

This section evaluates the effectiveness of the proposed abstraction and refinement debugging

methodology. First, state abstraction is applied to gate-level diagnosis problems. The second

set of experiments are conducted on RTL designs that are developed in a hierarchical manner.

For those circuits, function and hierarchical abstraction/refinement are used.

3.6.1 State Abstraction

To evaluate the effectiveness of state abstraction, hand-made bugs are inserted in circuits

from the ISCAS’89 and ITC’99 benchmarks as well as industrial RTL circuits from Open-

Cores.org [77]. The bugs are single gate changes or signal RTL assignment changes made at

random. For each erroneous circuit, 10 traces are obtained through pseudo-random simulation

that demonstrate the erroneous behaviour with respect to the reference models. These traces

are used by a sequential SAT-based debugger similar to [91] to locate the error sites. In the

proposed abstraction and refinement procedures of Section 3.4, the design and traces are modi-

fied from C and V to C ′ and V ′, respectively, before the debugging engine is called. Comparing

the performance of the debugger with and without the proposed techniques provides a fair

evaluation.

The experiments are conducted on a 2.66GHz Intel Xeon processor with 2 GB of memory and

a timeout of 7200 seconds for each problem. For each problem, a trace compaction procedure

is performed before debugging. This process reduces the length of the counter-example when

possible. This procedure first builds a graph of the visited states, it then connects edges between

repeated states and applies Dijkstra’s shortest path algorithm from the initial state to the final


circuits # gates # FF # clk # red. clk # cls (K) mem (MB) time/err (s) # err total (s)b04 711 66 516 335 2422 1132 740.0 9 6660.0b08 200 21 21 20 274 82 3.8 4 15.2b12 1140 121 40 19 1492 449 165.9 5 829.5b14 6028 245 54 54 mem out > 2000 - - -

s1488 693 6 104 5 214 42 1.6 9 14.4s5378 3222 179 3 3 554 105 13.1 3 39.3s13207 9442 669 2 2 1415 227 70.1 9 630.9s35932 21147 1728 75 8 3563 696 431.1 16 6897.6

div su 1528 126 9 6 607 109 12.4 64 793.6rsdecoder 10629 521 2 2 2043 301 120.1 9 1080.9spi 2027 90 20 18 2763 582 391.3 3 1173.9ac97 15166 1452 30 30 mem out > 2000 - - -

Table 3.1: Statistics for problems and the stand-alone SAT-based debugging approach

state [26]. More powerful trace compaction schemes may provide better results [17, 20]. The

resulting traces that do not distinguish the reference and buggy circuits are discarded.

In Section 3.4 the effects of abstraction on logic size and trace length were discussed.

Figure 3.9 summarizes this effect empirically on two designs, b04 and b14. Figure 3.9 (a)

demonstrates an apparently linear relationship between logic size reductions to the number of

abstracted state elements. In Figure 3.9 (b), experiments show that significant trace length

reductions are possible only after a certain threshold is reached. This threshold appears to be

over 50% for b04 and over 70% for b14. Thus for large problems where memory is a major

concern, a more aggressive approach, where over 70% of state elements are abstracted, may be

desirable.

b04b14

20

40

60

80

100

30 50 70 90

% state elements abstracted

% lo

gic

size

red

uctio

n

b04b14

20

40

60

80

30 50 70 90

100

% tr

ace

leng

th r

educ

tion

% state elements abstracted

(2) (b)

Figure 3.9: Logic and trace reduction vs. flip-flops abstracted


circuits red. logic(%) red. FF(%) red. trace(%) red. mem(%) time/err (s) # err maxN prev (s) refine (s) total (s) X impr.b04 20.5 45.4 0 9.8 530.0 12 3 11.0 0 6371 1.04

b08 26.0 47.6 65.0 60.0 0.2 12 3 0.1 0 3.35 4.53

b12 26.4 41.3 15.7 24.9 85.0 20 3 4.2 0 1704.2 0.48

b14 15.3 40.8 0 > 46.0 3740.2 2 2 42.0 0 7522.4 -

s1488 20.4 50.0 0 11.9 1.1 9 1 0 0 9.9 1.45

s5378 9.7 44.6 0 37.1 11.8 1 1 0 3.4 15.2 2.58

s13207 29.6 44.8 0 31.7 40.3 9 1 0 0 362.7 1.73

s35932 31.9 46.2 0 34.9 251.3 16 2 7.3 0 4028.1 1.71

div su 34.0 39.6 0 9.5 5.9 32 3 2.2 396.8 587.8 1.35

rsdecoder 34.7 43.1 0 22.9 54.8 7 1 0 0 383.6 2.81

spi 37.6 44.4 22.2 46.0 101.2 1 1 0 303.6 404.9 2.89

ac97 41.2 48.2 0 > 37.0 365.6 2 1 0 0 731.2 -

Table 3.2: Performance statistics for abstraction and refinement debugging framework

Table 3.1 presents a summary of the debugging problems used as well as performance

statistics when debugging the concrete circuits. Later, these results are contrasted with those

of the proposed abstraction and refinement framework. Columns 1, 2 and 3 present the circuit

name, number of gates, and number of flip-flops (state elements) in each circuit. Columns #

clk and # red. clk show the average length of the traces before and after the trace compaction,

respectively.

The next five columns summarize the results of the debugger for each problem. In Columns

# cls and mem, the number of clauses (in thousands) generated for each problem and the

debugger’s memory usage is presented. The number of equivalent errors found by the debugger

for the given vectors as well as the average time required to find them are presented in columns #

err and time/err, respectively. Finally, the total time required to find all the errors is presented

in column total.

To cope with the size of the larger problems the CNFs are partitioned into bands and solved

sequentially as described in [91]. For b14 and ac97 where the average reduced traces are 54

time frames and 30 time frames long, the problems still run out of memory. The proposed

abstraction framework is most beneficial for such memory intensive problems.

Table 3.2 presents the results of the proposed abstraction and refinement debugging frame-

work. For each problem, a random abstraction function is used such that between 40-50% of

the state elements are abstracted, a conservative amount according to Figure 3.9. To allow a

comparison with the data in Table 3.1, the percentage of reduced logic, reduced of flip-flops, ad-


ditional compacted traces, and overall reduced memory requirements are presented in columns

2-5, respectively. Looking across one row for the problem b08, by abstracting 47.6% of the

flip-flops, the logic is reduced by 26% and the trace length is reduced by an additional 65%

which leads to an overall memory reduction of 60% versus the stand-alone debugger.

The largest problems in Table 3.1 are for circuits b14 and ac97 and they ran out of memory.

With the new methodology, they both successfully complete. It can be calculated, that on the

average, the proposed methodology results in up to 60% memory reduction with average savings

of 30% under a conservative abstraction approach.

The majority of problems in Table 3.2 do not benefit from additional trace compaction.

This can be attributed to the fact that trace reduction is most effective for long traces since the

probability of matching states is higher. In the experiments, the initial trace compaction process

is able to reduce the traces considerably. For instance, the initial trace of circuit s1488 which

is 104 clock cycles is reduced to only 5 clock cycles after compaction, thus further reductions

are highly unlikely. For industrial traces of thousands of clock cycles derived from functional

testbenches and not randomly, it is highly unlikely to reduce traces drastically by simple state

matching techniques [20]. Therefore, trace reduction via abstraction may be more effective.

A summary of the run-time results of the proposed framework is presented in columns 6-12

of Table 3.2. In columns time/err and # err the average time required to find an error and the

number of errors found are presented, respectively. It should be noted that when the number

of errors is greater than those in Table 3.1, it means that abstracted state variables are found

as errors. In these experiments, if all equivalent error tuples are found (including the inserted

errors), then refinement is not performed. In practice, all equivalent errors are not necessary

as only the actual error is must be fixed. If the errors found by the proposed framework do not

include all equivalent error locations (i.e. # err is smaller in Table 3.2 than Table 3.1), then

all spurious solutions must be refined.

In Table 3.2, column maxN shows the maximum number of tuples searched until all equiv-

alent errors are found. The debugging time for all searches prior to maxN is shown in the

column prev. When refinements are necessary, the column refine presents the solve time for all

subsequent refinement searches.


For many problems in Table 3.2, the maximum error tuple found (maxN) is often greater

than 1 but always less than or equal to 3. The time required to determine that no solutions

exist prior to maxN (prev) is always quite smaller than the average time required to find an

error (time/err). If we take b12 for instance, it takes on average 4.2 seconds to determine that

no errors occur when N < 3 and 85 seconds to find each solution at N = 3. Relating these

times to the algorithm in Figure 3.4, it means that the approach is quite effective since the

majority of the time is spent in the debug function on line 6 when N=maxN and not when

N < maxN.

The total debugging time for the proposed approach is found by summing the product of

time/err and # error with prev and refine. The resulting total run-time is shown in column

total and its improvement over Table 3.1 is shown in column X impr. When abstracting 40-

50% of the state elements, not many refinement steps are necessary as most equivalent error

locations are found in the abstract model. However, even for the cases where refinement is

necessary, substantial run-time improvement is observed. The only problem that demonstrates

a performance decrease is b12 where four times more solutions are found in the abstract model

versus the concrete design. Overall, performance improvements of up to 4.5X are observed with

an average value of 2X across all problems. This increased efficiency can be attributed to the

smaller size of the constraint problems which lead to easier CNFs for the SAT solver.

As observed in Figure 3.9 smaller problem sizes and shorter traces can be achieved with

more aggressive abstraction than those of Table 3.2. To demonstrate the effectiveness of the

framework under a more aggressive abstraction strategy, the two largest problems b14 and ac97

are shown in Tables 3.3 and Tables 3.4 with 80% and 96% of the state elements abstracted. For

easy comparison, the first row of each table re-presents the problem properties of Table 3.2. The

following rows show the results after each abstraction and refinement steps until the specific

injected error is found (not all equivalent errors as in Table 3.1). For each table, column

1 describes whether the data is derived from Table 3.2 (Tbl 3.2), from the initial abstraction

(abs), or from a refinement step (ref). The remaining columns are labeled similarly to Table 3.2.

As expected, when more state variables are abstracted, greater memory savings are attained

and more refinement steps are necessary. However, along with the memory savings, more


step red. logic(%) red. FF(%) red. trace(%) mem(MB) time/err(s) err

Tbl 3.2 15.4 40.8 0 1080 3740.0 2

abs 52.7 81.6 20.3 344 172.0 4ref 1 50.2 80.8 20.3 378 225.1 3ref 2 50.1 80.4 20.3 404 242.3 10

Table 3.3: Summary of b14 when abstracting over 80% of flip-flops

step red. logic(%) red. FF(%) red. trace(%) mem(MB) time/err(s) err

Tbl 3.2 41.2 48.2 0 1260 1567.8 2

abs 89.7 96.4 33.3 555 365.6 2ref 1 89.5 96.3 33.3 765 665.8 10ref 2 89.4 96.2 33.3 773 664.0 6ref 3 89.1 96.1 33.3 776 721.8 9

Table 3.4: Summary of ac97 when abstracting over 96% of flip-flops

abstracted variables lead to much faster solve times per error. For instance, b14 requires 3740

second per error with 40% state abstraction while it requires only 172 seconds per error with

82% state abstraction.

It is interesting to notice the relatively small number of iterations necessary to find the

injected error. More precisely, b14 and ac97 require only two and three refinement steps,

respectively, before finding the errors. This small number of steps indicates that the appropriate

variables are selected for refinement and that the debugger is guided efficiently towards the

errors after each step.

Overall, the proposed abstraction and refinement debugging framework demonstrates its

effectiveness for large problems where conventional approaches may fail due to excessive memory

and/or run-time requirements.

3.6.2 Function Abstraction

This section presents the experiments for function abstraction. All the circuits used are from the

OpenCores.org website [77] except for an industrial communication design (comm), with nearly

500,000 synthesized gates. Each circuit contains a functional level error such as an incorrect

statement, incorrect module instantiation, bad wiring between modules, etc. These RTL errors

typically represent tens or hundreds of gate-level errors. The debugger used in all experiments is

the module-aware SAT-based automated debugger of [3]. This set of experiments is conducted


designProblem statistics Stand-alone debugger

size # DFF # clk (used) # literal time (s) mem (M)

wb con1 80695 818 19 (19) 518580 58.74 619wb con3 80695 818 1387 (40) 1273699 205.16 1250fdct1 264221 5461 189 (40) 1705328 555.37 4400mem ctrl1 38660 1145 1318 (40) 3887703 55.13 850vga1 147457 17102 16100 (40) 8679788 1635.78 4700vga2 147457 17102 141 (40) 212588 236.16 1350comm1 449927 30339 19 (25) 1912087 1575.67 5080comm2 453788 26852 88 (25) Mem out Mem out 8000comm3 453576 26852 1387 (25) 277649 809.31 4831

Table 3.5: Summary of problems for function abstraction

design nameabstracted problem stats comparison to original

maxN # itr mod refined/total # literals time (s) peak mem (M) lit reduced (×) speed up (×) mem reduced (×)

wb con1 1 3 3 / 8 115547 25.55 253 4.49 2.30 2.45wb con2 1 4 4 / 8 140713 149.12 469 9.05 1.38 2.67fdct1 1 6 5 / 5 1705328 638.78 4400 1.00 0.87 1.00mem ctrl1 1 4 12 / 14 112581 12.02 200 34.53 4.59 4.25vga1 1 2 5 / 14 13767 6.27 173 630.48 260.89 27.17vga2 1 5 6 / 14 94066 436.38 1052 2.26 0.54 1.28comm1 2 8 10 / 129 37960 108.32 772 50.37 13.11 6.58comm2 1 9 10 / 129 25105 1403.47 640 — — > 12.50comm3 2 8 8 / 129 80103 63.94 317 3.47 12.66 15.24

Table 3.6: Results of proposed function abstraction and refinement technique

on a 2.66 GHz 64 bit Intel Core 2 Quad processor with 8GB of memory.

Table 3.5 presents a summary of the debugging problems and the corresponding automated

debugger statistics using the SAT-based debugging engine of [3] (called stand-alone debugger).

Columns one, two, and three show the name of the debugging problem based on the design,

and its size in terms of gates and state elements (DFFs), respectively. Column four presents the

length of the erroneous trace in terms of clock cycles required to observe the erroneous behaviour

from an initial state. When the trace is too long for the debugger, the trace is reduced to only

contain the last 25 or 40 transitions in order to make automated debugging feasible. The

number of clock cycle traces used to formulate the debugging problem are presented in the

parentheses in column four. For example, the problem wb con2 contains 1397 clock cycles, but

only the last 40 clock cycles are used. The column # literals presents the total number of

literals generated in the CNF of the debugging problem [3]. Finally, columns times and mem

show the total run-time, in seconds, required to solve the problem and the required memory,

in MB, respectively. Notice that problem comm2 requires more than 8000MB to formulate the


problem and thus runs out memory.

Table 3.6 presents the result of the proposed technique on the debugging problems. Column

one shows the name of the problems, while column two shows the maximum error cardinality

(maxN) required to solve the debugging problem. As discussed, the cardinality required to

locate the bug using an abstracted design can be larger than required to solve the original

problem. Even though the problems shown here have a single functional-level (RTL) error, for

problems comm1 and comm3 a higher cardinality of 2 is used by the overall algorithm to find the

error site. It should be noted that due to the abstraction performed, when the cardinality is

increased, the number of potential solutions does not increase as sharply as in other debugging

techniques [3].

In Table 3.6, the column labelled # itr states the number of refinement and debugging

iterations required to find all the equivalent locations (number of times line 7 of Figure 3.8 is

run). The column mod refined / total presents the number of modules refined out of the total

number of modules in the concrete design. These modules are the only ones required to diagnose

the error. The smaller this number is, the more effective is the abstraction and refinement

technique. The next three columns, # literals, time (s), and peak mem (M) present the benefit

of the proposed technique in terms of the number of literals required in the problem formulation,

the total run-time in seconds and peak memory requirement by the entire algorithm.

The improvement provided by the proposed technique is shown in the last columns of

Table 3.6 where the reduction in the number of literals, the speed-up in run-time and the

reduction in memory over the debugging technique of [3] without abstraction and refinement is

shown. The effectiveness of the abstraction technique is attributed to reducing the problem size

which is directly related to the number of literals reduced shown. For example consider problem

vga1 where 5 / 14 modules are used leading to 630.48× reduction in literals which results in

a 260.89× improvement in run-time and 27.17× reduction in overall memory requirement. For

problem comm2 which resulted in memory out without the abstraction technique, only 640MB of

the available 8000MB are required. For all problems, the number of refinement and debugging

iterations performed is larger than one. Therefore, it is clear that each iteration is much easier

and faster when abstraction is used, thus it is more advantageous to run more iterations on


0

20

40

60

80

100

120

140

160

1 2 3 4 5 0

12000

24000

36000

48000

60000

72000

84000

96000

Sol

ve ti

me

(s)

# lit

eral

s

Number of iterations

solve time# literals

(a) vga2

0

100

200

300

400

500

600

1 2 3 4 5 6 0

300000

600000

900000

1.2e+06

1.5e+06

1.8e+06

Sol

ve ti

me

(s)

# lit

eral

s



(b) fdct1

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8 4000

8000

12000

16000

20000

24000

28000

32000

36000

40000

Sol

ve ti

me

(s)

# lit

eral

s



(c) comm1

Figure 3.10: Solve time and # literals in problem vs. the # of refinement and debugging

iterations for vga2, fdct1 and comm1

easier problem than fewer iterations on harder problems.

In Table 3.6 there are two problems that experience a slow-down. It is worthwhile to

analyze the reason for this behaviour. For problem fdct1, six iterations are required to solve

the problem, at which stage all 5 modules are used. Thus in this case, the extra iterations simply

add overhead as the entire circuit is needed in order to solve the problem. The problem vga2,

also experiences a slow down, but in this case, a 2.26× reduction in memory is observed. In this

case, unlike the overall trend, the simpler and faster debugging problems cannot compensate

for the extra iterations performed.

Figure 3.10(a), 3.10(b) and 3.10(c) provide detail into the numbers of Table 3.6 for vga2,


fdct1 and comm1, respectively. These figures illustrate the relationship between the run-time

shown in solid line and the number of literals shown in dashed line against the refinement and

debugging iterations. Notice the general trend where both run-time and number of literals

appear to increase exponentially with the increase in the number of iterations. For the ma-

jority of cases where the proposed technique is effective, abstraction allows the problem to be

solved with a fraction of its size thus leading to smaller memory requirements and run-times.

Considering problem vga2, notice that for iterations 3,4,5 the solve time is quite high thus not

providing any run-time benefit.

The proposed techniques allow for different degrees of abstraction to be applied. In general,

aggressive (high degree) abstraction leads to more debugging and refinement steps. However,

due to the simplicity of the design when abstracted aggressively, the initial debugging and re-

finement iterations are relatively much easy problems and thus quicker to solve. This behaviour

is observed in Figures 3.10(a), 3.10(b) and 3.10(c) where the initial iterations run faster than

later ones. It may be possible to find an abstraction heuristic that can balance the number

of iterations and the functions abstracted, but this is not a trivial task. However, in these

experiments, it is found and shown that abstracting all functions and modules is quite effective.

3.7 Summary

This chapter presents state and function abstraction and refinement techniques for design de-

bugging, allowing larger designs to be debugged faster and with less memory. Designs are first

abstracted resulting in smaller debugging problems. To ensure that all the equivalent error

locations are found in the original design, a refinement process is performed. Refinement is

applied in iterations thus only re-introducing the necessary components for debugging. A con-

sequence of state abstraction is that the error trace can be further reduced thus resulting in

smaller problem formulation. Function abstraction employed in a hierarchical framework allows

for a powerful debugging framework. The experiments demonstrate run-time improvements of

up to an order of magnitude for state abstraction and up to two orders of magnitude for func-

tion abstraction. Furthermore, both abstraction techniques dramatically reduce the memory


requirements of design debuggers as in some case as little as 10% of the memory limit is re-

quired. The advantages of abstraction and refinement are clear: larger designs can be tackled by

current debuggers with the given memory resources and consistent performance improvements

can be expected.

Chapter 4

Bounded Model Debugging

4.1 Introduction

Contemporary automated debuggers model sequential problems by employing the Iterative

Logic Array (ILA) technique, also known as time frame expansion [1, 9]. When modeling

sequential behaviour with an ILA, the combinational circuitry is replicated in the computer

memory for as many cycles as the counter-example or error trace requires. The ILA repre-

sentation of a sequential circuit has the advantage of explicitly modeling the circuit such that

existing combinational techniques can be utilized. For example, the ILA is a popular technique

in test (Automated Test Pattern Generation (ATPG)) and in verification (equivalence checking,

bounded model checking, debugging). Nevertheless, replicating the combinational part of a se-

quential circuit can lead to overwhelming memory requirements which in effect can degrade the

performance of the underlying algorithms. In debugging, the length of error traces can easily

exceed thousands of clock cycles in practice. Replicating the transition function for designs

with hundreds of thousands of primitive gate elements and for thousands of clock cycles may

not provide a viable solution.

This chapter presents a novel debugging methodology, namely Bounded Model Debugging

(BMD), that it is suited for problems with long error traces. The central idea behind BMD is

inspired by Bounded Model Checking (BMC) from the verification domain [9, 21]. Provided

enough resources, BMC and BMD establish a systematic way to cope with intractable problems

55

Chapter 4. Bounded Model Debugging 56

in an iterative manner. Hard and large problems are broken into many smaller sub-problems of

incrementally larger size and complexity, which are solved in succession. When completeness

cannot be guaranteed due to the approximation nature of the methodologies, both BMC and

BMD still provide valuable information to the user based on the solved sub-problems.

The key observation in BMD is based on the notion that errors are often excited and

observed within close temporal proximity of each other. For instance, if an error is observed in

clock cycle 100, then it is more likely that the error is excited between clock cycles 51 to 100

than between clock cycles 1 to 50. In practice, test and verification engineers have used the

above observation for decades when they manually “back-trace” a design using the last events

of very long error traces.

Based on the temporal proximity of error excitation and error observation, a BMD debugging

algorithm starts by considering a subset of the error trace. Initially, the subset of the error

trace is from clock cycle k1 to kf , where k1 is a clock cycle greater than one and kf is the

clock cycle where the failure is observed (usually the last cycle of the trace). In this thesis, we

call this interval the suffix of the error trace and it is used to formulate the initial debugging

problem. Intuitively, this portion of the problem is examined first with the expectation that

the error excitation point is within the set of cycles selected in the k1 to kf bound of the trace

suffix. A debugger that operates on the suffix trace will build a much smaller ILA than that of

the original trace, thus it will tackle a much smaller and easier debugging problem.

Clearly, when debugging with a trace suffix, errors excited prior to clock cycle k1 may not

be detected by the debugger. In this case, and to ensure completeness of the methodology,

BMD adds a special type of error suspects for consideration to the debugging problem called

initial state suspects. If these suspects are found as solutions when examining a suffix, it

indicates that errors may be active in clock cycles prior to k1. Such a situation requires a

second BMD iteration where a larger trace suffix is considered. As such, the second iteration

results in a debugging problem with a larger ILA representation, k2 to kf , where k2 < k1, but

still smaller than the original problem (i.e. 1 < k2). This iterative process continues until all

the equivalent error locations are found or resource limitations are reached. Notice that given

enough computational resources, similar to BMC, BMD also degenerates to a conventional


debugging problem formulation when ki reaches the first clock cycle of the trace.

This Chapter is organized as follows. In the next Section background information is pro-

vided on ILA and BMC. Section 4.3 introduces BMD by presenting an analysis of the sequential

debugging problem, the basic problem formulation, its impact on error cardinality and differ-

ent performance improvement techniques. Section 4.4 presents the experimental results while

Section 4.5 summarizes the Chapter. In this chapter, the terms clock cycle and time frame are

used interchangeably.

4.2 Preliminaries

This section presents background material pertaining to the ILA representation of sequential

circuits and to BMC as a partial motivation for this work.

The ILA representation models the sequential behaviour of a circuit over k clock cycles

by replicating its transition function for k time frames. A transition function refers to the

combinational logic cones that generate the next state and primary output values of a sequential

circuit given a set of input state and primary input logic values. Alternatively, a transition

function is represented by a time frame where the input/output of state elements are treated

as pseudo-output/input of the remaining circuitry. An ILA is built by first replicating the

transition function into k disjoint, but ordered, time frames. For every two consecutive time

frames, the set of current/next state variables are connected to the set of next/previous state

variables. For example, the circuit in Figures 4.1 is modelled as an ILA for five clock cycles or

time frames as shown in Figures 4.2.

When dealing with designs with hundreds of thousands of gates that need to be examined

(verification, debugging, etc) for thousands of clock cycles, the ILA model can be limited by

the memory available by the system. As a consequence, many problems in verification, testing

or debugging cannot be formulated using the ILA model, let alone be solved. When using a

SAT solver to examine a problem, the resulting CNF that corresponds to the combinational

circuitry of the ILA may contain way too many variables and clauses to fit in the memory and get

examined by the solver [23, 66]. In the remaining of this Chapter, we do not distinguish between


an ILA that it is composed of clauses or composed by gates since those two representations are

essentially equivalent.

Apart from debugging, the ILA model is popular in many CAD applications such as ATPG,

BMC, and sequential equivalence checking [1, 9, 94]. Similar to debugging, in Bounded Model

Checking (BMC), the memory limitations and run-time performance of the tools are highly

dependent on the length of the ILA. Contemporary BMC formulations typically start by at-

tempting to disprove properties using a small bound k1 < kdia, where kdia is the circuit diameter.

The circuit diameter, which is the longest of all shortest paths between any two states of the

circuit, is the minimum bound required to completely prove a safety property [9]. An ILA with

length k1 can model the sequential behaviour from the initial state to any state after k1 clock

cycles. If the property is disproved within this bound, then the BMC problem is solved and a

counter-example is returned. Otherwise, if this bound is not enough to disprove a property, the

bound is incremented and k2 > k1 is used in the next iteration of BMC. This process is repeated

with the bound being incremented until a counter-example is found, the complete proof bound

kdia is reached or resource limits are exhausted.

The BMC problem has been intensively studied over the last decade and many improvements

have been made [8, 9, 29, 95]. Even though, BMC is not a complete model checking technique

unless the bound equals the circuit diameter, it is deemed an effective “bug hunting” tool in the

industry. Today, many commercial organizations actively use BMC tools in their verification

flow with bounds much smaller than the circuit diameter successfully. The BMD technique

presented here, is based on similar intuitive background with that of BMC. However, as we will

describe, it entails fundamental theoretical and implementation differences.

4.3 Bounded Model Debugging Formulation

4.3.1 Probability Analysis of Error Behaviour

Bounded model debugging is motivated by the empirical observation that functional errors are

usually excited in temporal proximity to observation points such as primary outputs. In this

section, this observation is re-affirmed with a discussion and the respective probability analysis.


DFF

DFF

A

CD

B

E

Figure 4.1: Sample pipeline circuit with single output

A

C

E

A

C

E

A

C

E

A

C

E

A

C

E

BB B

DD D D

B

D

BB

D

Figure 4.2: Five time frame ILA for circuit in Figure 4.1

In combinational circuits, because there are no memory elements, errors are excited in the

same clock cycles that failing behaviours are observed. In sequential circuits, the situation can

be much more complex. This is because the erroneous behaviour may propagate across many

consecutive clock cycles as values get latched in memory elements. The observation point,

usually a primary output, may not exhibit an erroneous behaviour until many clock cycles

after the excitation point. Furthermore, expected values are typically not available for internal

signals or memory elements. Thus, when debugging simulation traces or counter-examples,

many clock cycles must be considered prior to the observation of the failure.

Consider the sequential circuit in Figure 4.1 and its ILA representation of five cycles shown

in Figure 4.2. Also assume that the first time a functional error is observed is at the primary

outputs of the fifth simulation cycle. We now analyze how errors can be excited in different

time frames to cause the observed failure without any knowledge of the input stimulus.


gate likelihood of errors excited in different clock cycles

name clock cycle 1 clock cycle 2 clock cycle 3 clock cycle 4 clock cycle 5

A Zero Zero Low Low High

B Zero Zero Zero Low High

C Zero Zero Zero Low High

D Zero Zero Zero Zero High

E Zero Zero Zero Zero High

Table 4.1: Likelihood of errors on gates of Figure 4.1 being excited

Notice that if the error is excited in the first two cycles, gate A cannot be the error source

because there is no propagation path from A in cycle one or two to any primary output in cycle

five. If it is the case that an error on gate A is excited in cycle three, this failure is not observed

in time frames three or four since the failure is first observed in time frame five. Similarly,

the error may be excited in time frame four, but a failure is not observed in that time frame.

Finally, the error can be both excited and observed in time frame five. This type of analysis,

without knowledge of input stimulus, can provide us with a confidence of where errors may be

excited.

Table 4.1 presents the degree of confidence of an error excited on the various gates of the

circuit from Figure 4.1 for different clock cycles. Although informal, this analysis confirms a

high likelihood that the error is excited in time frame five for all gates. As earlier clock cycles

are considered, the likelihood decreases and eventually it goes to zero in clock cycles one and

two. For the purpose of debugging, the above discussion states that it is more important for

an algorithm to spend its resources in latter clock cycles rather than in earlier ones.

Theorem 3 formally presents the probability of observing the first failure in clock cycle d,

given that the error is excited in clock cycle 1. Figure 4.3 illustrates the setup for the theorem

using a symbolic clock and an ILA representation.


Theorem 3 Assuming the following:

• a single error is excited in clock cycle 1

• no other errors are excited in any other clock cycles

• propi is the probability of the error propagating from cycle i to i + 1

• obsi is the probability of observing a failure in clock cycle i given that an error has prop-

agated to cycle i

• the input vector sequences are temporally independent and stationary random sequences

Given a sequential circuit, the probability of observing the first failure in clock cycle d is

pd =d−1∏

i=1

propi ×d−1∏

i=1

(1 − obsi) × obsd.

Proof:

Let

Wi = {an error propagates from cycle i to cycle i + 1 if it has propagated to cycle i }, and

Oi = {a failure is observable in cycle i if an error has propagated to cycle i }, and

E1 = {an error is excited in clock cycle 1}.

The probability pd can be stated in terms of the events Wi, Oi, and E1:

pd = P

(d−1⋂

i=1

Wi ∩d−1⋂

i=1

Oi ∩ Od

∣∣∣ E1

)

By applying the identity P(A ∩ B

∣∣ C

)= P

(A∣∣ C

)× P

(B∣∣ A ∩ C

), we get

pd = P

(d−1⋂

i=1

Wi

∣∣∣ E1

)

× P

(d−1⋂

i=1

Oi

∣∣∣

d−1⋂

i=1

Wi ∩ E1

)

× P

(

Od

∣∣∣

d−1⋂

i=1

Oi ∩d−1⋂

i=1

Wi ∩ E1

)

Here, the events Od andd−1⋂

i=1

Oi are conditionally independent of E1 ∩d−1⋂

i=1

Wi :

P

(

Od ∩d−1⋂

i=1

Oi

∣∣∣ E1 ∩

d−1⋂

i=1

Wi

)

= P

(

Od

∣∣∣ E1 ∩

d−1⋂

i=1

Wi

)

× P

(d−1⋂

i=1

Oi

∣∣∣ E1 ∩

d−1⋂

i=1

Wi

)

,

thus, P

(

Od

∣∣∣

d−1⋂

i=1

Oi ∩d−1⋂

i=1

Wi ∩ E1

)

= P

(

Od

∣∣∣

d−1⋂

i=1

Wi ∩ E1

)

.


As a result, pd can be simplified:

pd = P

(d−1⋂

i=1

Wi

∣∣∣ E1

)

× P

(d−1⋂

i=1

Oi

∣∣∣

d−1⋂

i=1

Wi ∩ E1

)

× P

(

Od

∣∣∣

d−1⋂

i=1

Wi ∩ E1

)

.

One of the assumptions made is that input vectors in successive cycles are all (temporally)

independent. Thus, any Wi is independent of Wj for all cycles i and j :

P(Wi ∩ Wj

∣∣ E1

)= P

(Wi

∣∣ E1

)× P

(Wj

∣∣ E1

).

As a result, P

(d−1⋂

i=1

Wi

∣∣∣ E1

)

=d−1∏

i=1

P(Wi

∣∣ E1

).

Similarly, by the assumption, any Oi is independent of Oj for all cycles i and j:

P

(

Oi ∩ Oj

∣∣∣

d−1⋂

k=1

Wk ∩ E1

)

= P

(

Oi

∣∣

d−1⋂

k=1

Wk ∩ E1

)

× P

(

Oj

∣∣∣

d−1⋂

k=1

Wk ∩ E1

)

.

As a result, P

(d−1⋂

i=1

Oi

∣∣∣

d−1⋂

i=1

Wi ∩ E1

)

=d−1∏

i=1

P

(

Oi

∣∣

d−1⋂

k=1

Wk ∩ E1

)

.

Using the above, pd can be simplified to:

pd =d−1∏

i=1

P(Wi

∣∣ E1

)×

d−1∏

i=1

P

(

Oi

∣∣

d−1⋂

k=1

Wk ∩ E1

)

× P

(

Od

∣∣∣

d−1⋂

i=1

Wi ∩ E1

)

.

In the assumptions, propj and obsj are defined as

propj = P(Wj

∣∣ E1

)for some cycle j

obsj = P

(

Oj

∣∣∣

j−1⋂

i=1

Wi ∩ E1

)

for some cycle j.

Using these definitions, pd can be presented as

pd =d−1∏

i=1

propi ×d−1∏

i=1

(1 − obsi) × obsd

�

Theorem 3 formally confirms the intuition that errors are more likely to be observed tem-

porally closer to the excitation point. More specifically, pd is found to be a negative ex-

ponential function with respect to the distance d. We can simplify pd by assuming that


1 2 3 4

clock

ILA observedexcited

d − 1

d − 1. . . d − 2 d

Figure 4.3: Illustration of example where error is excited in cycle 1 and observed in cycle d

propi equals the constant prop and obsi equals the constant obs for all cycles i, resulting in

pd = propd−1 × (1 − obs)d−1 × obs. This simplified relationship is plotted in the three curves

of Figure 4.4 with values of prop = obs = {0.1, 0.5, 0.9}. For values at d = 1 we have the case

where pd = P (O1|E1) = obs. The negative exponential relationship is clear from these plots as

the three curves are no longer visible when d is greater 6.

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10

p d

d: clock cycles

prop=obs=0.5prop=obs=0.9prop=obs=0.1

Figure 4.4: Plotting pd as function of d with prop = obs = {0.1, 0.5, 0.9}

We can further analyze pd as a function of prop and obs. Figures 4.5(a), 4.5(b), 4.5(c)

and 4.5(d), present this relationship in three dimensional graphs with values of d=1,2,3 and 4,

respectively. First, we notice special case for d = 1 where pd depends only on obs. In the next

three figures, each curve reaches a maximum pd with prop = 1 and obs is between 0.2 and 0.5.


0 0.2 0.4 0.6 0.8 1 0 0.2

0.4 0.6

0.8 1

0 0.2 0.4 0.6 0.8

1

pd

prop

obs

pd

(a) d = 1

0 0.2 0.4 0.6 0.8 1 0 0.2

0.4 0.6

0.8 1

0 0.05 0.1

0.15 0.2

0.25

pd

prop

obs

pd

(b) d = 2

0 0.2 0.4 0.6 0.8 1 0 0.2

0.4 0.6

0.8 1

0 0.02 0.04 0.06 0.08 0.1

0.12 0.14 0.16

pd

prop

obs

pd

(c) d = 3

0 0.2 0.4 0.6 0.8 1 0 0.2

0.4 0.6

0.8 1

0 0.02 0.04 0.06 0.08 0.1

0.12 0.14 0.16

pd

prop

obs

pd

(d) d = 4

Figure 4.5: Plotting pd as function of prop and obs with different d = {1, 2, 3, 4}

Intuitively, the higher the probability of prop, the more likely the error can propagate across

clock cycles to be observed. Thus, for any given d, the erroneous behaviour is more likely to

reach clock cycle d. The effect of obs is not as straight forward. High probabilities of obs,

mean that the error can be observed in early clock cycles and may not reach clock cycle d, thus

leading to small pd. However, when obs is very small, it reduces the likelihood of observing the

error even in clock cycle d, also reducing pd. In general, as d increases the value of obs that

maximizes pd decreases.

The above analysis allows us to better understand the general relationship between pd and

d. The analysis with respect to prop and obs also provides insight in how pd can vary. However,

we should emphasize that many assumptions made here, including the constant probabilities of

prop and obs are not realistic in industrial settings. In other words, prop and obs may not be

independent of each other and pd will also be dependent on the stimulus vectors of the circuit,


the circuit structure and checkers, and other factors not included in analysis. Furthermore,

errors can be excited multiple times in an error trace leading to more complex scenarios. As

a result, the analysis should be used primarily to confirm the steep negative slope of pd with

respect to d. This general relationship is validated by the experimental results of Section 4.4.

4.3.2 Problem Formulation

Since sequential debugging problem formulations are modelled using the ILA representation,

the problem size depends linearly on the number of clock cycles in the error trace. As with most

SAT-based CAD techniques, the performance of SAT-based automated debuggers degrade at

a rate faster than linear with the problem size [26]. This is due to the fact that debugging is

an NP-Complete problem which may require exponential run-time in the worst case [91, 100].

Larger CNF problems not only demand more memory, but they may also have a dramatic

impact on the overall run-time performance of the solver.

Given the analysis presented in Section 4.3.1 and the considerable impact of the trace length

on the overall performance, a practical BMD methodology is devised. BMD is a complete

and systematic debugging methodology that focuses on finding error suspects by considering

suffixes of the error trace. Shorter traces provide the benefits of generating a smaller problem

instance that can fit into the given memory and also has the potential to improve the run-time

performance.

Given an error trace of kf clock cycles, the BMD methodology starts by considering a short

suffix of the error trace from clock cycle k1 to kf . Note that kf is the first clock cycle where a

failure is observed. In the remainder of this chapter, we use notation vBMD to refer to diagnosis

vector obtained from the suffix of the error trace. As stated in Definition 2 of Section 2.2, a

diagnosis vector contains initial state values as well as stimulus and expected response sequences.

The suffix of the error trace directly provides the input stimulus and response sequences. The

initial state values for vBMD are captured in the state elements of the design when simulated

using the input stimulus sequence from clock cycle 1 to k1−1. Using the diagnosis vector vBMD

and the erroneous design C, an automated debugging problem can be formulated as presented

in Section 2.5.


A

C

A

C

A

C 1/0

1/0

11 1

1/0

D D

BB

Figure 4.6: ILA for length three with error excited on gate A

As an example consider a circuit represented as a three time frame ILA in Figure 4.6. Here

the correct circuit is shown with the correct and erroneous values annotated onto the ILA,

separated by a “slash” respectively. Signal values that are not relevant to the example are

omitted for clarity. Signals that do not have an erroneous value are only labelled with a single

Boolean value. This convention is used for the remaining examples in this chapter.

When debugging the three cycle ILA, a debugger will return the equivalent error gates A

and C and memory elements B and D. When only the last two clock cycles k1 = 2 and kf = 3

are considered, the ILA shown in Figure 4.7 is used to formulate the debugging problem. The

initial state of B= 0 is obtained by simulating the erroneous circuit for one clock cycle. When

the problem is provided to an automated debugger, the suspects corresponding to gates B, C

and D are returned. Since the error on gate A is excited in the omitted first time frame, this

potential error source is missing from the set of solutions.

As illustrated by the above example, considering only a suffix of an error trace can result in a

set of incomplete suspects. To avoid this behaviour, a mechanism is required to determine when

solutions are not complete and a longer ILA must be considered for debugging. Returning to

Figure 4.7, notice that the erroneous behaviour of gate A is captured in the initial state variable

B (i.e. 1/0). Even though gate A is not available in the two-cycle ILA, the fact that the initial

state B is found as a suspect means that its transitive fanin logic may also be erroneous. In

other words, if the method finds some initial states as suspects, this can be seen as a signal

that the returned solution set may be incomplete.


A

C

A

C 1/0

1

1/0

D D

B

B1/0

Figure 4.7: Suffix of size two of ILA shown in Figure 4.6

Theorem 4 Assume that errors on gates G1, G2, ..., GN are excited between clock cycles 1 to

k1 − 1 of a kf cycle long trace. If the first failure observed from these errors is in clock cycle

kf , then the erroneous behaviour of gates G1, G2, ..., GN in clock cycles 1 to k1 − 1 is observed

in state elements of clock cycle k1.

Proof: Since a failure is first observed in clock cycle kf , the error sites excited in clock cycles

1 to k1 − 1 must propagate to observation points in clock cycle kf . Since state elements are

the only components that can propagate signal values across time frames, the erroneous values

must propagate through state elements in clock cycle k1. �

The above theorem can be extended to equivalent error locations, since these locations

cannot be distinguished from one another under a given vector sequence set (see Section 2.5).

According to Theorem 4, initial state elements can capture the erroneous behaviour of their

erroneous transitive fanins. Thus, initial state elements can identify whether solutions are

incomplete as they exhibit an erroneous behaviour. More specifically, an automated debugger

can be used to find initial state suspects in addition to error suspects. Initial state suspects

represent possible corrections in the state element functions in the first time frame of the suffix

(i.e. at clock cycle k1). If an initial state suspect is found, an erroneous gate may be excited

in a time frame prior to k1 and its behaviour may have propagated to k1 thus leading to the

failure in kf . As a result, a longer suffix with more clock cycles must be considered so that the

engine can localize all errors.


Recall that in SAT-based debugging, the circuit is enhanced with correction models to

consider specific locations as sources of errors. In Section 2.5, multiplexers are added to the

output of gates to implement correction models at the gate-level. For sequential debugging

problems, since a correction model must be consistent across time frames, the select lines of

corresponding multiplexers in different time frames are tied together. In other words, the

correction models in different time frames are grouped together. This allows an automated

debugger to treat all the different correction models for a single gate across all time frames as

a single suspect during analysis.

Returning to the previous example, for automated debugging, the erroneous circuit must be

enhanced with correction models (see Section 2.5) to represent initial state suspect as sources of

errors. In this case, these correction models are only needed in the first time frame. Note, that

they are independent of one another and should not be grouped together. Figure 4.8 illustrates

the debugging problem of Figure 4.7 where correction models are represented with circles of

different patterns. Circles with the same pattern represent grouped correction models.

As an illustration, gate A in time frames one and two have black dots because they represent a

correction in both time frames for gate A. Notice that there are effectively two correction models

on variables corresponding to signals B and D in the first time frame. For gate B, a “\” pattern

represents the gate error suspect while a “#” pattern represents the initial state suspect. In

this case, a debugger will find both suspects of B as solutions. Since one of the solutions is an

initial state suspect, the results may not be complete and a longer suffix must be considered.

In the next iteration, a suffix comprising the entire trace is used and the missed solution (gate

A) is found.

In summary, the BMD methodology starts by formulating the debugging problem with a

smaller suffix than the original trace. If an automated debugger finds any initial state suspects

as part of the solution, the suffix length is increased to k2 where k2 < k1. This longer suffix,

results in a new ILA to formulate the debugging problem in the next iteration of BMD. This

process of debugging followed by an increased suffix length if initial state suspects are found

is repeated until no initial state suspects are found or the resource limits are reached. In the

worst case, this process degenerates to a conventional debugging technique when the complete


��

��

��

��

��

��

��

��

��

��

A

C

A

C 1/0

1

1/0

D

B

1/0B

D

Figure 4.8: ILA of Figure 4.7 annotated with error suspects and initial state suspects

BA

C

DFF

Figure 4.9: Simple circuit with single error source on gate A

trace is examined. As shown through experiments, this scenario seldom occurs.

4.3.3 Impact on Error Cardinality

The BMD methodology introduced in the previous section can impact the error cardinality,

N , used by automated debuggers, (see Definition 6 of Section 2.3.1). Consider the example

in Figure 4.9 where the error on gate A propagates to state element C and gate B. Figure 4.10

illustrates the resulting ILA for two time frames. In this case, when employing BMD with an

initial suffix of length one and looking for N = 1 errors, only the suspect gate B is found. The

erroneous gate A and the state element C are not returned as solutions since neither one can

fix the failure on its own. As a result, because state element suspect C is not contained in the

solution set, the BMD length will not be increased and the method will terminate erroneously.

The missed solution (i.e. set of suspects) in the above example is due to the fact that

the error from gate A propagates to two separate gates (state elements C and gate B) whose

combined effect result in the observed error. Thus the debugging problem requires a cardinality

N = 2 with suffix length of one. For example, if N = 2 with k1 = 2, then solution {B, C} is

returned. Since C is also an initial state suspect, the suffix length will be increased and the


A B BA

0/0

C

1/0

1/1 1/0

1/0

1/0

1 1

Figure 4.10: Example of single error source excited in two clock cycles

algorithm will iterate successfully.

For certain problems, the error cardinality used by the debugger must be increased in order

to identify initial state suspects as solutions and increase the suffix length. Given that the

maximum user defined error cardinality is maxN , we need to find the maximum error cardinality

to use with BMD. The following theorem presents an upper bound for error cardinality that

will find all initial state suspects, thus guaranteeing completeness of BMD. This upper bound

is later refined in the improvements of Section 4.3.4.

Theorem 5 Given an erroneous circuit with maxN errors, a diagnosis vector vBMD with a

suffix trace from clock cycle k1 to kf , where k1 > 1, the BMD debugging methodology will find

all initial state suspects as solutions if the maximum error cardinality used is

maxNBMD = NDFF + maxN

where NDFF is the total number of state elements in the circuit.

Proof: When debugging with BMD using the diagnosis vector vBMD, the problem formulation

will contain correction models on each erroneous gate G1, G2, ..., GmaxN in time frames k1 to kf

plus all the state elements of time frame k1. The correction models for circuit gates are grouped

together across time frames. In the worst case, maxN errors are excited both before and after

clock cycle k1, thus erroneous behaviour can be latched in the state elements in clock cycle

k1. In this case, a debugger requires all the correction models to be active (i.e. select line of

multiplexers assigned to 1) to find a solution set corresponding to this erroneous behaviour. The

maximum number of active lines is one for each erroneous gate and one for each state element


C E

B

A

D

DFF

F

DFF

1/0

1 1

1/01/0 1/0 1/0 1/0

Figure 4.11: Example of pipelined circuit

C

B

A

E EED

F

C

B

A

C

B

A

D

F

D

F

1

1/0

1

1/0

1/0

1/0 1/0

1/0

1/0

Figure 4.12: Three time frame ILA of circuit in Figure 4.11

or maxNBMD = NDFF + maxN . With maxNBMD all initial state suspects that contain an

erroneous behaviour will be found in the solution set. �

Theorem 5 presents the maximum cardinality required by an automated debugger using

BMD to guarantee completeness of the solutions. Under the suffixes of different BMD itera-

tions, the error cardinality can increase as shown above, or decrease. The following example,

illustrates a case, where the maximum error cardinality is reduced under a given suffix. Con-

sider Figure 4.11 where two error gates A and B combine to result in the observed failure in time

frame three. The original ILA of size three is shown in Figure 4.12. With a BMD suffix of size

two, the suspects D, E and F are found with N = 1. Since D is also an initial state suspect the

suffix length must be increased. In summary, the initial BMD iteration is solved with N = 1

while an error cardinality N = 2 is required to solve the problem with the complete trace.

Every time an initial state suspect is found as a solution, the suffix length is increased

and a subsequent debugging problem is formulated. The error cardinality for the subsequent

problems must be reset to N = 1 regardless of its value in previous iterations. Reseting N


DFF

1/0

1/0DFF

DFF

1/0 1/0

1/0

BA

E

C

D

Figure 4.13: Example with error source A propagating through three DFFs

ensures that the smallest cardinality solutions are found in every iteration. Re-visiting the

example in Figure 4.9, when the ILA length is increased from one to two clock cycles, the

cardinality must be reset from N = 2 to N = 1 in order to find the single error site A.

4.3.4 Improvements to Basic Methodology

The previous sections introduced the basic BMD methodology and presented a flow to guarantee

solution completeness. This section presents several performance enhancing techniques.

4.3.4.1 Reducing the Number of Initial Error Suspects

One improvement relates to the set of initial state suspects. As stated by Theorem 5, the

maximum error cardinality for a BMD problem can grow according to the number of state

elements in the circuit. For example in Figure 4.13, a single erroneous gate propagates to

three state elements before propagating to a primary output. In order to identify the error

source, the cardinality must be incremented to N = 4 in order to consider a longer suffix. Since

the complexity of the debugging problem grows exponentially with the error cardinality, as

explained in Section 2.3.1, it becomes important to develop techniques to aid BMD so that it

does not require a large error cardinality while remaining complete.

One way to avoid a large increase in the error cardinality, is to group all correction models

for initial suspects together as a single suspect. Recall from Section 4.3.2 that grouping correc-

tion models can be as simple as connecting the select lines of multiplexers together. Since any


solution set with initial state suspects requires increasing the length of the suffix for future iter-

ations of BMD, there is no need to distinguish which initial state suspects are found. Returning

to the example of Figure 4.13, when grouping all initial suspects together, the debugger does

not distinguish between solutions containing initial state suspects for DFF B, C, or D as any one

requires increasing the suffix length. As a result, the suffix can be increased when using N = 2

instead of N = 4.

Theorem 6 Given an erroneous circuit with maxN errors, a diagnosis vector vBMD with a

suffix trace from clock cycle k1 to kf , where k1 > 1, the BMD debugging methodology will find

all initial state suspects as solutions if the maximum error cardinality used is maxNBMD =

maxN + 1 and all initial state suspects are grouped as one suspect.

Proof: Since all logic value propagations across consecutive clock cycles occur through state

elements, by grouping all initial state elements together, all propagation must occur through this

single group. As a result, all erroneous behaviour from clock cycles prior to k1 must propagate

through the initial suspect group to reach the failure in clock cycle kf . When modeled by a

single suspect, the initial suspect group can be selected in combination with other suspects by

increasing the cardinality maxN by one. �

4.3.4.2 Reusing Solutions

Another improvement relates to the iterative nature of the BMD methodology. At every itera-

tion, the debugging problems with a longer suffixes may contain solutions that are already found

through previous iterations with smaller suffixes. For instance, in the example of Figure 4.6,

solutions B, C, and D with k1 = 2 are also found at k2 = 1. This leads us to the conclusion that

solutions found with smaller suffixes can be excluded from the search space of future iterations

when a larger suffix is used for a given error cardinality N .

Theorem 7 Consider two debugging problems formulated using the erroneous circuit C, the

maximum error cardinality N , and two different trace suffixes. One suffix is from clock cycle

ki to kf , while the other is from clock cycle kj to kf , and ki ≥ kj. Every solution s ∈ Si from


1: exit condition = 02: Final Solutions = ∅3: k = kf − incr

4: while (!exit condition) do

5: initial states = get current states(C,k − 1)6: v = {initial states, stimulusk→kf

, responsek→kf}

7: S = Suspect locations ∪ initial state suspect

8: Solutions = debug(C,v, N, S)9: for all Solution ∈ Solutions do

10: valid solution = 111: for all Suspects ∈ Solution do

12: if (is initial state(Suspect)) then

13: k = k − incr

14: N = 015: valid solution = 016: end if

17: end for

18: if (valid solution == 1) then

19: Final Solutions = Final Solutions ∪Solution

20: end if

21: end for

22: if (N == maxN) then

23: exit condition = 124: else

25: N = N + 126: end if

27: end while

28: return Final Solutions

Figure 4.14: Complete BMD algorithm

the first debugging problem is also a solution to the second debugging problem, s ∈ Sj, if s does

not contain any initial state suspects.

Proof: Since the solution s with cardinality N does not contain an initial state suspect, then

the error is active within the clock cycles ki to kf . Since the interval kj to kf contains interval

ki to kf , solution s will also be a solution in the larger interval provided the cardinality N is

the same. �

The observation in Theorem 7, allows the BMD framework to skip solutions found in pre-

vious iterations to achieve performance improvements. Practically, this is performed by adding

a blocking clause to the CNF to prevent finding solutions that have been already found in

subsequent iterations.



The BMD methodology described in this chapter including the improvements of Section 4.3.4

is presented in the algorithm of Figure 4.14.

Initially, BMD uses the suffix from clock cycle kf − incr to clock cycle kf as shown on line

3. The while loop shown from line 4 to line 27 comprises the BMD iterations where successive

debugging problems are constructed with longer suffixes. On lines 5 the initial state constraints

are captured by simulating the C for k − 1 cycles while on line 6, the stimulus, response and

initial state values are combined to construct the diagnosis vector v. Grouping the initial state

suspects as presented in Section 4.3.4 and adding all the potential suspects to S is performed

on line 7. On line 8, an automated debugger is called to solve the constructed problem with

error cardinality N .

Once solutions are found by the debugger, determining to extend the length of the suffix

is decided on line 12 based on whether the grouped initial state suspect is found. Lines 13–14

increase the length of the suffix and reset the error cardinality. When a solution does not

contain the initial state suspect, the solutions are added to the final set as shown on line 19.

Finally, the BMD process terminates when the maximum user defined cardinality maxN is

reached in line 22. Not shown here, are terminations conditions based on resource limits such

as time-out and memory-out.

4.4 Experiments

In this section, we present experimental results of the BMD methodology presented in this

chapter. All experiments are conducted on a single core of a Core 2 Quad 2.66GHz machine

with 8GB of memory. The debugger used is a hierarchical sequential debugger developed in

C++ based on the concepts of [3] with a Verilog frontend to allow for RTL-based debugging.

The SAT solver used by the debugger is MiniSAT [74]. In the following this tool is referred to

as the stand-alone debugger.

The circuits selected for experiments are Verilog RTL designs from OpenCores [77] as well as

three industrial designs (fxu, rx comm, s comm) provided to the research group by semiconduc-


Problem # gates # DFFs # cycles (kf ) run-time (s) # solutions error found

ac97 ctrl-1 25310 2346 978 2613.62 49 yesac97 ctrl-2 25288 2345 670 1245.19 34 yesdiv64bits-1 74846 5512 108 713.01 21 yesfdct-1 377801 5717 182 MO N/A nofdct-2 377801 5717 186 MO N/A nofpu-1 82371 1083 316 2108.97 6 yesfpu-2 22953 515 640 TO 10 nofxu-1 602673 29080 28 1958.15 32 yesfxu-2 267423 12016 154 TO 3 nomem ctrl-1 46168 1145 681 2190.29 5 yesmem ctrl-2 46168 1145 757 TO 5 norx comm-1 585641 30339 675 MO N/A norx comm-2 585641 30339 253 MO N/A norx comm-3 585632 30339 573 MO N/A norx comm-4 220456 18333 180 2240.73 85 yesrx comm-5 585265 30339 99 TO 54 norx comm-6 585641 30339 560 MO N/A nos comm-1 779607 29967 212 MO N/A nos comm-2 779607 29967 212 MO N/A nos comm-3 779575 29967 212 MO N/A nos comm-4 779607 29967 132 MO N/A nos comm-5 790407 29967 132 MO N/A nospi-1 2942 185 251 973.18 65 yesspi-2 2954 185 648 MO N/A novga-1 153837 17102 863 MO N/A novga-2 153837 17102 902 MO N/A novga-3 155370 17206 175 1626.64 63 yesvga-4 154137 17138 209 1531.70 33 yesvga-5 154609 17146 381 MO N/A novga-6 153837 17102 849 MO N/A nowb-1 4479 251 269 466.03 14 yeswb conmax-1 85049 818 651 MO N/A no

Table 4.2: Circuit and performance statistics without BMD

tor firms. In each of these designs one or more errors are added at the RTL level. For example

these errors may be wrong state transitions, incorrect RTL operations, or even wrong module

instantiations. It is important to emphasize that these errors at the RTL often translate into

dozens of error locations at the gate-level. Every instance of the designs with an inserted error

is a debugging problem used in the experiments. Each debugging problem has a corresponding

diagnosis trace which includes stimulus vectors and expected response vectors provided by the

testbench.

Table 4.2 provides a summary of the debugging problems as well as the performance of


circuit run-time (s) BMD iters # solns found found in iter improv (×)

ac97 ctrl-1 204.57 10 7 0 6.09ac97 ctrl-2 747.24 10 13 1 1.67div64bits-1 1264.49 10 20 2 0.56fdct-1 TO 5 38 0 N/Afdct-2 TO 4 48 2 N/Afpu-1 201.01 4 6 1 10.49fpu-2 333.00 10 24 1 10.81fxu-1 479.14 1 24 1 7.51fxu-2 174.36 1 28 1 4.09mem ctrl-1 22.43 1 5 1 97.65mem ctrl-2 28.35 1 11 1 126.98rx comm-1 452.97 1 30 1 7.95rx comm-2 331.19 1 18 1 10.87rx comm-3 369.09 1 5 1 9.75rx comm-4 TO 3 81 7 0.62rx comm-5 275.79 1 15 1 13.05rx comm-6 393.01 1 17 1 9.16s comm-1 TO 4 21 1 N/As comm-2 TO 4 20 3 N/As comm-3 TO 4 14 1 N/As comm-4 TO 3 71 1 N/As comm-5 TO 3 39 2 N/Aspi-1 151.07 10 63 1 3.53spi-2 106.47 10 57 1 33.81vga-1 553.35 3 63 1 6.51vga-2 1336.67 3 33 1 2.69vga-3 685.95 3 83 1 2.37vga-4 163.03 1 6 1 9.40vga-5 2982.43 5 29 3 1.21vga-6 166.52 1 8 1 21.62wb-1 553.35 3 63 1 0.84wb conmax-1 41.56 1 12 1 86.62

Table 4.3: Performance with BMD on increment size of 10 clock cycles

the stand-alone debugger on each instance. Column one, two and three label the debugging

problem, and show its gate and DFF count, respectively. Column four shows the number of clock

cycles in the entire stimulus trace provided by the testbench. This number also corresponds to

the first clock cycle kf where a failure is observed. The problems used are specifically chosen

because of their large circuit size (over 100K gates), long error trace (hundreds of clock cycles)

or both. This combination results in hard problems that push the capabilities of state-of-the

art debuggers.

The next three columns of Table 4.2 present debugging statistics when using the stand-alone


debugger. Column five shows the run-time in seconds required to solve each problem. Column

six enumerates the number of solutions found, or the total number of equivalent error locations

found with maxN = 1. Column seven states whether the actual inserted RTL error is found

as one of the solutions. In cases where more than one hour of CPU is used, a time-out (TO)

is declared and where more than 8GB of memory is required, a memory-out (MO) is declared.

In summary, of the 32 debugging problems, three time-out, 17 memory-out, and the inserted

error is found in only 11 or 34% of all cases.

The BMD methodology introduced in this chapter is implemented according to the algorithm

of Figure 4.14. Here an initial suffix length of 10 clock cycles is used as well as an increment

of 10 clock cycles each time the suffix is increased. A maximum limit of 100 clock cycle is

set as a hard limit, where the BMD methodology terminates. The performance of BMD is

presented Table 4.3 with the problem instance shown in Column one. Column two presents the

run-time in seconds required by BMD to solve each problem. Column three shows the number

of BMD iterations performed until the process terminates, or the number of debugger problems

solved with different suffixes. The corresponding total number of solutions found by all BMD

iterations are shown in column four. When the inserted error is found, the iteration in which

the error is found is listed in column five. If the inserted solution is not found, a zero (0) is

listed in the column. The final column presents the performance improvement achieved by the

BMD methodology over the stand-alone debugger.

The benefit of the BMD methodology is apparent based on multiple criteria. First notice

that none of the problems solved with BMD exceed the 8GB memory limit while 17 instance

resulted in a memory-out with the stand-alone debugger. Instead, with BMD, eight problems

run over the one hour time limit. It is clear that BMD provides a trade-off between the time and

memory resources. This trade-off is seen favourably because the overall number of problems

where the inserted error is found increases from 11 to 30 when using BMD. In practice, the

complete problem need not be solved in order to find the error source or to provide vital

debugging information to the user.

When using BMD, as shown in column five, for only two problems the inserted RTL error

is not found. These are ac97 ctrl-1 where the maximum suffix length of 100 clock cycles is


0

500000

1e+06

1.5e+06

2e+06

2.5e+06

3e+06

0 100 200 300 400 500 600 700 800

Mem

ory

usag

e (K

B)

CPU run time (s)

(a) ac97 ctrl-2

0

200000

400000

600000

800000

1e+06

1.2e+06

1.4e+06

1.6e+06

0 50 100 150 200 250 300 350 400 450 500

Mem

ory

usag

e (K

B)

CPU run time (s)

(b) fpu-2

0

1e+06

2e+06

3e+06

4e+06

5e+06

6e+06

7e+06

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Mem

ory

usag

e (K

B)

CPU run time (s)

(c) rx comm-4

0

1e+06

2e+06

3e+06

4e+06

5e+06

6e+06

7e+06

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Mem

ory

usag

e (K

B)

CPU run time (s)

(d) vga-5

Figure 4.15: Memory usage versus CPU run-time for four selected problems

reached and fdct-1 where the time-out limit of one hour is reached. Similarly, in column four

of Table 4.3 at least some solutions are found for all 32 problems. In contrast, in Table 4.2,

for 17 of 32 cases no solutions are found at all due to exceeding memory resources. Again, this

data favours the memory versus time trade-off provided by BMD.

The data in Table 4.3 also reaffirm the probabilistic analysis performed in Section 4.3.1

that errors are excited in temporal proximity to the failure point. In column three, 12 of 32

problems only require one BMD iteration or a suffix of length 10 clock cycles in order to debug

the problem completely. On average less 15% of the original trace length is used. Without

considering cases that time-out, only 6 of 24 problems or 25% of cases require more than 100

clock cycles to provide complete solutions.

Finally, notice the run-time improvement of the BMD methodology over the stand-alone

debugger shown in column six of Table 4.3. Here improvements are achieved from 1.21 × to


0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10

ac97_ctrl-2fpu-2

rx_comm-4vga-5spi-1

Figure 4.16: Debugger solutions found versus BMD iterations

126.98 ×, or two orders of magnitude. In three only cases, div64bits-1, rx-comm4 and wb-1

a performance hit is noticed because the multiple iterations of BMD result in longer run-time

than running the stand-alone BMD. However, it is clear that BMD is very effective for the vast

majority of problems.

Figures 4.15(a), 4.15(b), 4.15(c) and 4.15(d), plot the memory requirement as a function of

CPU time for problems ac97 ctrl-2, fpu-2, rx comm-4, and vga-5. The memory requirement

graph follows a rising step pattern each time the suffix length is increased. For example, in

Figure 4.15(d), there are five distinct plateaus corresponding to the debugger solving problems

with suffixes of length 10, 20, 30, 40 and 50. As the suffix length increases, the incremental

memory required appears constant. However, notice that the solve time increases at a faster

rate than the suffix length. For example, the first iteration, which requires approximately 1.5

GB, takes under 100 seconds to solve, while the last iteration, which requires approximately 6.5

GB, takes approximately 600 seconds to solve. These graphs confirm the fact that with larger

suffixes, debugging problems also become considerably harder to solve.

The final analysis of the BMD methodology is with respect to the number of solutions found

as a function of BMD iterations. As shown in Figure 4.16, for the sample problems selected, the

number of solutions found by BMD increases initially and plateaus in later iterations. Notice

that the number of solutions does not always increase, since some solutions which may contain

initial state suspects in prior iterations may be removed as solutions in future iterations. This


graph portrays the BMD methodology favourably as it indicates that increasing the suffix length

after a certain point does not result in any more new solutions. As a result, the BMD approach

of starting with a small suffix and systematically increasing the suffix length appears to be

effective for debugging.

4.5 Summary

This chapter introduces the bounded model debugging methodology to efficiently and system-

atically tackle problems with long error traces. The contribution is based on the empirical

observation that errors are excited and failures are observed in temporal proximity. This ob-

servation is reaffirmed through probability analysis as well as through empirical evidence. The

BMD methodology proposed, inspired by bounded model checking, is found to be faster than a

state-of-the-art debugger in 90% of cases. Furthermore it is more robust, as the error is found

in over 93% of problems compared to 34% without BMD. Overall, BMD allows large problems

with very long traces to be handled in an efficiency manner by existing debuggers.

Chapter 5

Debugging using Max-SAT

5.1 Introduction

The contributions presented in Chapters 3 and 4 build on the SAT-based debugging formulation

of Smith et al. as presented in Section 2.5. In this formulation, the SAT problem is constructed

using an erroneous circuit and corresponding diagnosis vector. Since the resulting union of

the CNF for these two elements is inherently unsatisfiable, mechanisms are added to the CNF

to formulate a satisfiable problem whose satisfying assignments correspond to error suspects.

These mechanism are the correction models and the error cardinality constraints described in

Section 2.5. In part the debugging problem is formulated as a satisfiability problem to take

advantage of the great improvements achieved in SAT solvers over the past decade.

There are other types of analysis tools and solvers that fit well the unsatisfiable nature

of the debugging problem. For instance, unsat cores, which are derived from the proof of

unsatisfiability of a SAT instance can provide insight about the conflicting behaviour of circuit

components [68, 96]. Maximum satisfiability (max-sat) solvers are engines that reason about an

unsatisfiable CNF to find the largest subset of the CNF clauses whose union is satisfiable. The

inverse of the max-sat solution, the CNF clauses not included in the subset, can help identify

error sites. Thus unsat core analysis and max-sat solvers can help solve debugging problems

with little or no additional mechanisms.

One of the contributions presented in this chapter is the first automated debugging frame-

82

Chapter 5. Debugging using Max-SAT 83

work using maximum satisfiability. The formulation is constructed from the union of the con-

straints corresponding to the erroneous design and the diagnosis vector. Since the incorrect

design cannot produce the correct response under the given stimulus, the CNF can only be

satisfied if some of the CNF clauses are removed. A max-sat solver identifies these constraints

by finding the largest CNF clause subset that is satisfiable. The remaining constraints in turn

correspond to circuit components whose rectification will remove the observed functional failure.

In order to guarantee completeness, max-sat solvers can be called iteratively until all maximum

satisfiable subsets of a given cardinality are found.

The proposed technique is an alternative to gate-level SAT-based debugging which can be

easily enhanced to over-approximate solutions. Over-approximation using max-sat is a second

major contribution of this work as it allows the debugging problem to be tackled in a divide

and conquer approach by trading-off the tool’s performance with solution resolution. More

specifically, approximation can reduce the problem complexity and thus require less run-time

at the cost of finding larger, less precise solutions. Although not exact, this approach is proposed

as a pre-processing step that filters solutions for a second stage exact debugger. The second

stage conventional debugger will benefit from having fewer potential suspects which translates

to faster run-times. The combined two-step debugging reduces the complexity of both stages,

resulting in a more computationally efficient overall solution.

A suite of experiments on combinational and sequential circuits for single and multiple

vectors are conducted to demonstrate the benefit of the proposed framework. On average,

the over-approximation technique quickly eliminates 92% of the suspects. The second stage

debugger uses the filtered suspects to find the exact error sources in a fraction of the time

it would take otherwise. Overall, performance improvements of 200 times or two orders of

magnitude over a state-of-the-art debugger are observed consistently.

In the next section, background is provided on max-sat solving. Section 5.3 presents the

proposed max-sat approach for combinational circuits and Section 5.4 extends this for sequential

circuits and for multiple vectors. Section 5.5 present the over-approximation technique and the

overall framework developed for optimal performance, respectively. Experiments are presented

in Section 5.6 followed by the the Chapter summary in Section 5.7.


5.2 Background

5.2.1 Maximum Satisfiability

While max-sat is concerned with finding a satisfiable set of clauses with maximum cardinality,

this can be generalized to find Maximal Satisfiable Subsets (MSSes). An MSS is a satisfiable

subset of a formula’s clauses that is maximal in the sense that adding any one of the remaining

clauses would make it UNSAT. Any max-sat solution is of course an MSS, but MSSes can

be different (smaller) sizes as well. In this work, the complements of MSSes, sets of clauses

whose removal makes the instance satisfiable, are of interest. Just as an MSS is maximal, its

complement is minimal, and we refer to such a set as a Minimal Correction Set (MCS). This

work makes use of two following techniques developed as extensions to the algorithm from [60]:

• Finding all MCSes up to size k

• Grouping clauses to produce ”approximate” MCSes

Finding all MCSes up to size k is performed by the algorithm AllMCSes from [60], which

was developed as the first phase of an approach for finding all Minimal Unsatisfiable Subsets

(MUSes). This procedure solves consecutive optimization problems, finding MCSes in order of

increasing size (equivalent to finding their complementary MSSes in order of decreasing size).

MCSes are returned as they are found, and execution can be stopped when a size limit is

reached.

The second ability, of grouping clauses, depends on the way the algorithm uses clause-

selector variables. Every clause Ci is augmented with a new variable yi, producing C ′

i =

(yi + Ci) = (yi → Ci). When yi is assigned TRUE, the original clause Ci must be satisfied,

while when yi is FALSE, C ′

i is satisfied, essentially disabling the original clause. This gives a

standard SAT solver the ability to enable and disable constraints implicitly within the normal

backtracking search. By assigning the same y variable to multiple clauses, a set of clauses can

be treated as a single higher-level constraint (the conjunction of all clauses given the same

y variable) that can be enabled and disabled at once. Using this approach, each MCS is a

minimal set of groups of constraints whose removal makes the instance satisfiable. This leads to


an over-approximation of an MCS of the original clauses, because extra clauses will be included

in groups even though they may not be necessary. The benefit of the over-approximation is

that it can greatly increase the performance of the algorithm as the search space is reduced

exponentially.

This work uses the MCS techniques outlined above for debugging. Although not precise in

the general case, the term max-sat is used throughout to refer collectively to the techniques

above for simplicity of the presentation.

5.3 Debugging Combinational Circuits with Max-sat

Given an erroneous circuit C, an input stimulus I, and the corresponding correct output re-

sponse O a CNF formula can be produced as follows.

Φ = I · O · CNF(C)

This CNF problem is naturally unsatisfiable because the erroneous circuit cannot produce the

correct output response under the given input vector. Since the inconsistency between a circuit’s

actual and correct response is due to some gate-level error sources, the unsatisfiability of the

problem is due to the clauses derived from these error sources. In other words, the clauses

that are at conflict in the CNF correspond to the circuit-level error sources from which they

are derived. Therefore, the circuit-level errors can be identified by finding the CNF-level error

clauses.

The max-sat approach in Section 5.2.1 can identify Maximal Satisfiable Subsets (MSSes)

whose complements are Minimal Correction Sets (MCSes). These MCSes represent sets of

clauses whose removal from the CNF make the problem satisfiable. In the formula Φ constructed

using the constraints I, O, and CNF(C), the MCSes map directly to error clauses. Once the

error clauses are identified through MCSes, the gate-level suspects are found by mapping each

clause to the gate it is originally derived from as described in Section 5.2.

For example, consider the correct and erroneous circuit in Figure 5.1 (a) and (b) where gate

A is mistakenly implemented as an AND gate instead of an OR gate. Under the input stimulus

{a = 0, b = 1, d = 1} the circuit has a response of {e = 0} instead of the correct response of


e

d

c

1

10 a

b

1B

A

(a)

e

d

cb

a

01

10

B

A

(b)

Figure 5.1: Correct and erroneous circuit

{e = 1}. The corresponding erroneous CNF for the circuit and the input/output vectors are

shown below.

(a) · (b) · (d) · (e)

(a + c) · (b + c) · (a + b + c)

(c + e) · (d + e) · (c + d + e).

Here, the max-sat approach described in Section 5.2.1 can return the MCS (a+ c) as a solution

because removing this clause from the CNF makes the formula satisfiable. Notice that this

clause is derived from the erroneous gate A.

The above example illustrates how the removal of an error clause can help identify the error

source. Further analysis of the example demonstrates that there are other clauses such as (c+e)

whose removal can satisfy the problem. Indeed, more than one error clause may exist in a given

problem corresponding to the many potential error sources at the gate-level. These are more

commonly known as equivalent errors or faults in the diagnosis literature [1]. Note that the

removal of the clause (a) also satisfies the problem, however since this constraint is not part of

the circuit component of the CNF (i.e. C), it is not considered as an error clause.

For the debugging technique to be complete, all equivalent errors must be found. Each of

these is known as a suspect error source because it may fix the problem such that erroneous

circuit produces the correct response for the given input vector. As a result, the AllMCSes

algorithm of Section 5.2.1 is used to find all error clauses and consequently all gate-level error

suspects.


5.3.1 Error Clause Cardinality

Since the solution space for the AllMCSes algorithm is exponential, an explicit limit for the

maximum cardinality of the MCSes is advised to prevent memory explosion. In practice, this

limit, called the error clause cardinality, must be relatively small due to memory and perfor-

mance considerations. The error clause cardinality determines the completeness and efficiency

of the proposed technique.

Since this work is primarily concerned with gate-level debugging the limit used must cor-

respond with the gate-level cardinality of conventional debuggers. In Section 5.2 the error

cardinality Ng is defined as the maximum gate tuples that may be responsible for the erro-

neous behaviour. At the level of the CNF encoding, the error clause cardinality Nc must be set

to a value such that all the gate-level errors at Ng can be found using the proposed max-sat

approach. Thus completeness in this context is with respect to the gate-level debuggers such

as [33]. The following theorem proves that the proposed approach is complete for a given value

of Ng.

Theorem: The algorithm AllMCSes called on the problem Φ = I ·O ·CNF(C) with a limit

of Nc is complete if Nc is equal to [the maximum number of clauses derived for any single gate

in the CNF] ×Ng.

Proof: Proof by contradiction. Suppose there is a gate-level error not identified by the

proposed approach using the error cardinality limit Nc. Since AllMCSes iteratively finds sets

of clauses with cardinality 1 up to Nc, the gate-level error must be caused by more than Nc

clauses. However, Nc is equal to the maximum number of clauses derived from any one gate

times Ng, so the error must be caused by more than Ng gate-level sources. Therefore the error

is not found using conventional debuggers with Ng either. �

In many circuit-based SAT problems, the circuit is first converted to a 2-input AND-

INVERTER graph and then translated into CNF [10, 54]. In such a CNF formula, the maximum

number of clauses from any gate is 3, thus Nc = 3 × Ng. Using this value for Nc results in

finding all the solutions found using conventional debuggers with Ng. In CNF formulas derived

from arbitrary circuits where the number of clauses generated can greatly vary from one gate

to another, the proposed max-sat debugging technique may return more solutions than the


gate-level debugger for a given Ng. As discussed further in Section 5.5 this scenario does not

pose a problem under the proposed framework.

5.3.2 Error Group Cardinality

The previous section presented a limit for the error clause cardinality to guarantee completeness

for the proposed approach. Although complete, increasing the error clause cardinality is not

always desired as the complexity of the debugging problem is exponentially related to the error

cardinality [91]. Here, the grouping ability described in Section 5.2.1 is used to reduce the

complexity of the problem while maintaining completeness.

Grouping all clauses derived from the same gate together allows the max-sat solver to

“enable” or “disable” all of those clauses simultaneously. In effect, this gives the solver the

ability to treat each gate as a single high-level constraint, leading to solutions (MCSes) found

directly in terms of the gates. Under this problem restriction, the error clause-group cardinality,

Ncg required to find gate-level errors can be effectively Ng.

Theorem: By grouping all clauses derived from the same gate together, the proposed

technique is complete if the error clause-group cardinality Ncg = Ng.

Proof: Since each group has a one-to-one correspondence with a circuit gate, when a group

is found as part of an MCS, all clauses corresponding to the original gate are “disabled” by

the AllMCSes algorithm. Thus every solution found by AllMCSes maps to a set of the original

gates. Hence, limiting the group cardinality is equivalent to limiting the gate cardinality. �

Re-visiting the example of Figure 5.1, grouping the clauses of gate A together with the

clause-selector variable yA and the clauses of gate B together with the clauses-selector variable

yB, results in the following CNF.

(a) · (b) · (d) · (e)

(a + c + yA) · (b + c + yA) · (a + b + c + yA)

(c + e + yB) · (d + e + yB) · (c + d + e + yB).


5.4 Extension to Sequential Circuits and

Multiple Vectors

Debugging sequential circuits is similar to that of combinational circuits except that their

behaviour must be modeled for a finite number of clock cycles. These clock cycles are necessary

to excite and observe the errors. A popular approach for modeling sequential circuits is to

use the time frame expansion technique or the Iterative Logic Array (ILA) representation.

This technique replicates a circuit’s transition relation, called a time frame, and connects the

current-state and the next-state of adjacent time frames together. In effect, the sequential

circuit is transformed into an “unfolded” combinational circuit that can be debugged like any

other combinational circuit. Although not require for this section, further detail on the ILA

technique can be found in Section 4.2.

Since the complexity of debugging increases exponentially with the number of error sources,

debuggers must be careful not to consider the “replicated” gates across time frames as unique

error sources. For example, a single gate-level error in an ILA with 3 time frames may appear

to have 3 distinct error locations, however, replacing the functionality of a single gate in the

original sequential circuit will fix the problem in all time frames.

The proposed max-sat debugging technique can be extended to handle sequential designs

efficiently. First, the sequential circuit is converted to an ILA and then translated into CNF.

Similar to the previous formulation the CNF is then constrained with input stimulus and output

response, I and O resulting in

Φ = I · O · CNF(ILA(C)).

Here, we assume that the initial state constraints are also contained in I. The second step

is to account for the replication due to the ILA by grouping all clauses derived from the same

gate but from any time frame. As a result, clauses from a particular gate will be “enabled” and

“disabled” at once irrespective of the time frames they represent.

For example consider the erroneous sequential circuit shown in Figure 5.2(a) and its ILA in

Figure 5.2(b). Here, the gate A has been erroneously implemented as an AND gate instead of an

OR gate. As a result, the output of A in the first and second time frames should be 1 instead


D Q

Be

d

A

b

a c

(a)

1

10

1

1

1

1

0

1

1

e1

A

b1

a1

c1

Bd1

A

b2

a2

c2

Bd2

A

b3

a3

c3

Bd3

e2 e3

(b)

Figure 5.2: Erroneous sequential circuit and its ILA representation

of 0. Note that the input stimulus and correct response are also shown in Figure 5.2(b). The

corresponding CNF for the constrained ILA is shown below.

(a1) · (b1) · (d1) · (e1)

(a1 + c1) · (b1 + c1) · (a1 + b1 + c1)

(a1 + e1) · (d1 + e1) · (a1 + d1 + e1)

(c1 + a2) · (c1 + a2)

(b2) · (d2) · (e2)

(a2 + c2) · (b2 + c2) · (a2 + b2 + c2)

(a2 + e2) · (d2 + e2) · (a2 + d2 + e2)

(c2 + a3) · (c2 + a3)

(b3) · (d3) · (e3)

(a3 + c3) · (b3 + c3) · (a3 + b3 + c3)

(a3 + e3) · (d3 + e3) · (a3 + d3 + e3)

In the above example, the clauses corresponding to gate A in both time frames 1 and 2

are responsible for the discrepancy between the actual and correct response. Specifically, these

are (b1 + c1) and (b2 + c2). However, by grouping all clauses derived from gate A together

and those from gate B together, irrespective of the time frames, the single group solution is

returned. Below is the modified CNF based on grouping clauses from gate A (B) together with

the clause-selector variable yA (yB).


(a1) · (b1) · (d1) · (e1)

(a1 + c1 + yA) · (b1 + c1 + yA) · (a1 + b1 + c1 + yA)

(a1 + e1 + yB) · (d1 + e1 + yB) · (a1 + d1 + e1 + yB)

(c1 + a2) · (c1 + a2)

(b2) · (d2) · (e2)

(a2 + c2 + yA) · (b2 + c2 + yA) · (a2 + b2 + c2 + yA)

(a2 + e2 + yB) · (d2 + e2 + yB) · (a2 + d2 + e2 + yB)

(c2 + a3) · (c2 + a3)

(b3) · (d3) · (e3)

(a3 + c3 + yA) · (b3 + c3 + yA) · (a3 + b3 + c3 + yA)

(a3 + e3 + yB) · (d3 + e3 + yB) · (a3 + d3 + e3 + yB)

For debugging problems with multiple vectors, ~I = {I1, I2, ...}, ~O = {O1, O2, ...}, the union

of the CNF problems for each vector results in a single constraint system. In other words the

CNF corresponding to the circuit, C is again replicated for each vector. Similar to the approach

for sequential circuit, all clauses derived from the same gate, regardless of which replica of C

they occur in, must be grouped together and treated as a single error source. It should be

noted that the groupings for multiple vectors and sequential circuits is in addition to the gate

groupings discussed in Section 5.3.

5.5 Debugging with Approximate Max-sat

In practice, debugging via an exact max-sat formulation may not be practical, as the num-

ber of groups and clauses under consideration can be quite high thus resulting in a “hard”

max-sat problem. The proposed max-sat strategy can be easily modified to perform an over-

approximation instead of finding exact solutions. The benefit of the over-approximation is that

the speed and resolution trade-off can be adjusted for the problem: reducing the resolution or

granularity of the solutions found yields improved performance in terns of run-time.

The over-approximation is achieved by grouping clauses together as described in Section 5.2.1

and finding the MCSes in terms of the groups. Note that the groupings discussed here are in

addition to those presented in Section 5.3 and 5.4. Different grouping strategies can be easily

formulated ranging from random groupings to those based on a circuit’s topology or structure.


Similarly, groups can differ in cardinality from a single clause to thousands of clauses. For in-

stance, a set of clauses can be grouped together if they are in the same fanout-free cone which is

similar to the dominator debugging technique introduced in [91]. Another example is grouping

based on a high-level modules derived from RTL similar to the technique of [3]. Intuitively,

generating groups based on the circuit’s structure or modularity may be advantageous as fewer

solutions/suspects may be returned compared to arbitrary grouping schemes.

Grouping clauses may increase the effect of error masking, in which some error sources

may not be detected as they are masked by others [3]. This also occurs in traditional diagnosis

techniques when error-free models are used. For instance, consider the gates shown in Figure 5.1

and a pair of errors on gates A and B. In this scenario, the single model-free error, A, masks

the pair solution of A and B.

Similar scenarios can occur when grouping clauses together, especially if the groups are

made arbitrarily. For instance, consider the CNF illustrated in Figure 5.3 where some clauses

are grouped in A and other are grouped in B. Further consider a pair of error clauses illustrated

by the “X”. Here, the single solution identifying group A masks the pair solution A and B. It

should be emphasized that error masking is not unique to the proposed technique as it occurs

in gate-level and hierarchical debugging as well [3]. Generally, in all debugging approaches the

user must be aware of possibilities of error masking.

X

clause groupX

clause group

Clauses derived from circuit

B

A

Figure 5.3: Error masking in clause groupings


debuggerexact

stage 2erroneousdesign

input/outputvectors

stage 1

suspectsover−approximate exact

suspectsmax−satdebugger

(AllMCSes)

G

Figure 5.4: Max-sat debugging framework

5.5.1 Efficient Max-sat Framework

This section presents a performance optimized debugging framework using the discussed max-

sat technique. The complexity of conventional debugging techniques such as SAT-based tools

depend to a large extent on the number of suspects that must be considered. In the past,

divide and conquer schemes based on the problem hierarchy have proven beneficial [3]. Here,

the approximate max-sat approach can be used as a filter to remove the majority of the suspects

by quickly finding over-approximate solutions. Subsequently, any exact debugging approach can

be used and will benefit greatly by not having to consider all the original suspects during its

analysis.

Any type of grouping can be used; however, in the remainder, clauses are grouped in sets of

size G according to their corresponding circuit-level topology. Every group contains G clauses

(except for one group that contains the remainder of the clauses in the CNF) from gates in

close proximity to one another. For sequential circuits and multiple vectors, the group size is

G× [the total number of replications] as described in Section 5.4. Figure 5.4 illustrates the flow

of the proposed framework where the suspects are first filtered by the max-sat engine and then

processed by the exact debugger. The optimal value of G, found experimentally, determines

how the debugging effort is divided between the two stages.

5.6 Experiments

The proposed framework is implemented in C++ using the max-sat algorithm (AllMCSes)

in [60] and the SAT-based debugging engine in [33] as a second stage debugger. Six combina-

tional and ten sequential circuits from ISCAS85, ISCAS89 and ITC99 benchmarks as well as


20 40 60 80 100 120 1400

5

10

15

20

run−

time

(sec

)

max−satdebugtotal

clause grouping size

(a) c6288

5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

clause grouping size

run−

time

(sec

)

max−satdebugtotal

(b) mot-comb3

Figure 5.5: Run-time versus clause grouping size

OpenCores.org [77] are used to construct several design debugging problems. The erroneous

circuits are obtained by manually changing the functionality of a single gate at random. The

failing test vectors are generated by running pseudo-random simulations until an erroneous re-

sponse is observed. Experiments are conducted using both single and four failing test vectors.

The performance of the proposed framework utilizing the max-sat pre-processing is compared

against the efficiency of the SAT-based debugging engine in [33] without pre-processing. In all

experiments, the size of the clause group error cardinality Ncg is set to one to find the single

error sources. In addition to the groups created for the over-approximation, clauses are also

grouped together based on the circuit replicas as discussed in Section 5.4. Experiments are con-

ducted on a Pentium IV 2.8 GHz Linux platform with a 1GB memory limit and 3600 seconds

time-out.

In order to determine the effectiveness of the overall debugging framework of Section 5.5.1

as a function of the group size G, experiments are conducted on several representative circuits.

Figure 5.5 (a) and (b) shows two such experiments, using circuit c6288 and mot-comb3, where

three curves representing the run-times of the over-approximate max-sat stage, the exact de-

bugging stage, and the combined run-times are presented for several group sizes. The run-time

of max-sat increases abruptly as the group size becomes very small, and it reaches a maximum

when the exact method is used (single-clause groups). However, as the group size increases, the

run-time of the second stage debugger increases as it must consider many more suspects due to


the over-approximation. The combined curve shows the total run-time of the overall framework

is minimized with group sizes of roughly 10 to 20 clauses.

In the remaining, “max-sat20+debug” refers to the proposed framework with a grouping

size of G = 20. For sequential designs and multiple vectors the actual number of clauses per

group is 20 times the number of circuit replicas. Table 5.1 compares max-sat20+debug to the

stand-alone debugger of [33]. Rows 1− 6 report experiments with combinational circuits given

a single failing test vector, and 7 − 16 (17 − 26) report experiments with sequential circuits

given one (four) failing test vector(s). The first four columns contain the circuit’s name, its

size in gates, the number of test vectors used, and the total number of circuit replicas needed.

The fifth column (# error locs) gives the total number of potential error locations that could

explain the faulty behaviour of the circuit (the complete set). These are the locations expected

to be returned by both approaches when available. The sixth column gives the run-time of the

stand-alone debugger. An entry of [TO] denotes a time-out, and [MO] denotes a memory-out.

Chapter

5.

Debuggin

gusin

gM

ax-SAT

96

Circuit and debugging info Debug Max-sat20+debug

name# # # # error time max-sat debug total time X

improv.gates vecs repl. locs (sec) # grps # suspects % susp red time (sec) time (sec) (sec)

mot-comb1 2, 162 1 1 4 4.79 3 49 97.73% 0.03 0.05 0.08 59.88mot-comb2 5, 487 1 1 13 54.50 13 178 96.76% 0.13 0.24 0.37 147.30mot-comb3 11, 268 1 1 16 357.67 14 189 98.32% 0.27 0.47 0.74 483.34c6288 3, 466 1 1 75 67.96 48 536 84.54% 0.45 1.23 1.68 40.45c7552 2, 644 1 1 248 25.66 74 789 70.16% 0.11 3.11 3.22 7.97c5315 1, 884 1 1 11 4.83 7 99 94.75% 0.04 0.07 0.11 43.91

rsdecoder 12, 041 1 2 11 572.68 7 126 98.95% 0.67 0.65 1.32 433.85spi 2, 012 1 21 19 80.54 12 194 90.36% 1.15 2.99 4.14 19.45erp 2, 449 1 3 13 36.09 11 179 92.69% 0.20 0.25 0.45 80.20ac97 15, 599 1 6 4 [TO] 3 58 99.63% 2.22 1.45 3.67 > 980.93reactimer 265 1 512 7 51.81 6 89 66.42% 47.58 6.15 53.73 0.96divider 5, 248 1 15 4 1, 160.39 3 52 99.01% 14.58 1.32 15.90 72.98b14 5, 695 1 22 45 1, 377.86 36 627 88.99% 11.17 50.75 61.92 22.25b15 8, 938 1 13 32 [TO] 40 645 92.78% 96.99 65.82 162.81 > 22.11s15850 10, 481 1 2 19 747.36 12 183 98.25% 0.53 0.71 1.24 602.71s38584 21, 006 1 14 58 [TO] 34 566 97.31% 28.02 36.00 64.02 > 56.23

rsdecoder 12, 041 4 8 11 [TO] 7 126 98.95% 2.88 2.01 4.89 > 736.20spi 2, 012 4 81 4 264.07 6 107 94.68% 4.95 4.39 9.34 28.27erp 2, 449 4 12 4 73.71 5 101 95.88% 0.82 0.52 1.34 55.01ac97 15, 599 4 23 4 [TO] 3 58 99.63% 9.95 5.05 15.00 > 240.00reactimer 265 4 1, 745 6 172.30 6 89 66.42% 2, 845.80 21.48 2, 867.28 0.06divider 5, 248 4 71 4 [TO] 3 52 99.01% 54.74 5.44 60.18 > 59.82b14 10, 114 4 1, 216 − [MO] − − − [MO] − − −b15 8, 938 4 62 − [TO] − − − [TO] − − −s15850 10, 481 4 8 19 [TO] 12 183 98.25% 2.21 3.64 5.85 > 615.38s38584 21, 006 4 178 35 [MO] 20 365 98.26% 626.45 376.62 1,003.07 > 3.59

Table 5.1: Max-sat+debug versus stand-alone debugger


10−1

100

101

102

103

5

10

15

20

run−time (sec)

num

ber

of s

olve

d in

stan

ces

max−sat20+debugdebug

Figure 5.6: Number of solved instances for max-sat20+debug and debug

The remaining columns present the results of our proposed framework. The first four

(# grps, # suspects, % susp red, and time (sec)) report the number of groups (of 20× # repl.

clauses) returned by the AllMCSes algorithm in any MCS; the number of suspect variables

identified by those groups, each corresponding to a potential gate-level error source; the percent

reduction in the number of suspect gates; and the run-time of this first stage. The true benefit

of the proposed technique is evident when considering the number of suspects that are filtered

by the first stage with relatively small run-time. For instance consider the circuit ac97 with

a single vector. The approximation technique rules out 99.64% of the suspects in just 2.22

seconds. On average, the number of suspects is reduced by over 92%.

The run-time in seconds of the second stage debugger using the suspects of the first stage is

shown in column debug time (sec). Finally, the total time (sec) column shows the combined run-

time of the proposed framework. This number is compared with the run-time of the stand-alone

debugger in column six to get the improvements shown in the final column (X improv.).

These results demonstrate the overwhelming advantage of the proposed method over the

stand-alone debugging engine as the run-times are reduced by an average of 200 times. For

combinational circuits, the number of solved instances is increased from 16 to 24 out of 26,

a 50% improvement, and for sequential circuits with one (four) test vector(s), the number of

solved instances is increased from 7 (3) to 10 (8), a 43% (167%) improvement.


10−1

100

101

102

103

10−1

100

101

102

103

run−time of max−sat20+debug (sec)

run−

time

of d

ebug

(se

c)

Figure 5.7: Run-time comparison for max-sat20+debug and debug

Figure 5.6 plots the number of solved instances as a function of run-time on a logarithmic

scale for max-sat20+debug and stand-alone debug. It can be seen that max-sat20+debug out-

performs the stand-alone approach by roughly two orders of magnitude across all problems.

Figure 5.7 plots the total run-time of max-sat20+debug for each instance against the corre-

sponding run-time of stand-alone debugger on a logarithmic scale. Clearly, most points lie

above the 45o line which indicate the better performance of the proposed framework. Points

on the upper border indicate the instances solved by max-sat20+debug but unsolved by the

stand-alone approach. The single point where the proposed framework fares essentially worse

is caused by the large run-time of the first stage. Such cases can be addressed by increasing

the group size G, thus reducing the difficulty for the AllMCSes algorithm.

5.7 Summary

This work presents an efficient two stage debugging framework which uses a novel max-sat

problem formulation. First, it is shown that the debugging problem can be solved exactly

with a max-sat formulation. The approach is extended for sequential circuits and for problems

with multiple vectors. An over-approximation technique is developed to take advantage of

the strengths of the max-sat techniques. This technique considers groups of clauses together

and can thus make decisions based on the groups instead of the individual clauses. The over-


approximation technique is used as a pre-processing step that filters the majority of suspects

and reduces the problem complexity drastically for any debugger used in the second stage.

Experiments demonstrate overwhelming run-time improvements of two orders of magnitude on

average.

Chapter 6

Trace Reduction

6.1 Introduction

Whether debugging is performed manually or with an automated debugger, there are two major

factors that effect its efficiency. Two of the crucial factors are the circuit size and the error trace

length. Techniques such as abstraction/refinement and BMD presented in Chapters 3 and 4

reduce the challenges imposed by the design size and trace length, respectively. However, for

debugging problems with very long traces additional techniques are required to assist BMD.

For instance, consider the case where an error is excited and its effect is stored in RAM only

to be read after a few hundreds of clock cycles. Once the RAM data is read and its erroneous

behaviour propagates to a primary output a failure is observed. The length of the error trace

can be hundreds, thousands, or even millions of clock cycles long. In such situations the problem

must be reduced to a manageable size for an automated debugging tool to be able to operate

effectively.

Trace reduction (also known as trace compaction) is a technique that generates a new simu-

lation trace from an original one returned by a verification tool. This new trace allows a circuit

to transition from an initial state to a final state with a fewer number of clock cycles than the

original trace while exhibiting a similar erroneous behaviour. In debugging, the initial state

is often a reset or a reachable state, the final state is where the failure is observed, and the

original trace is provided from a simulation testbench or a formal verification tool. Since the

100

Chapter 6. Trace Reduction 101

majority of verification performed in industry is simulation-based with a heavy usage of ran-

dom or constrained-random stimulus patterns, many error traces are found to be unnecessarily

long. In other words, a shorter error trace may be able to reproduce the failure in fewer clock

cycles. Returning to the previous RAM example, the minimal trace length must excite the error

source, write it to the RAM, read from the RAM, and propagate the effects to the primary

outputs. Thus the transitions between writing to the RAM and reading from the RAM may

be superfluous and they can be omitted.

Trace reduction can often reduce the size of a trace by orders of magnitude. With a shorter

trace, the debugging task of the verification engineer can be considerably easier as fewer signals

and clock cycles must be analyzed. Similarly, an automated debugger may be able to solve

problems much faster and with fewer resources. Although powerful for debugging, trace reduc-

tion can benefit applications such as stimulus pattern generation, property checking and silicon

debug as well [12, 101].

Previous work shows that for random and constrained-random based simulations, error

traces can often be reduced to a fraction of their initial size [17, 20, 78, 90]. One such technique

uses forward image computation using Binary Decision Diagrams (BDDs) to reduce the trace

length [20]. In [90], techniques are presented to remove variables from counter-examples in

order to simplify them, but their length is not reduced. Another recent work uses several

techniques based on performing further simulations and Bounded Model Checking (BMC) to

achieve smaller traces [17]. The technique of [78] is the closest to ours as they utilize a sequential

Boolean Satisfiability (SAT) solver to find short-cuts in the original trace. More specifically,

[78] seeks to find the shortest path from the initial state to some candidate intermediate state

similar to BMC but using a sequential SAT solver.

In this chapter, we propose a trace length compaction technique where the shortest path

from the initial state to a final state is sought. This approach is based on reachability analysis

where an all-solution SAT solver is used as the pre-image computation engine [51, 69, 71].

The benefits over the existing BDD [20] and BMC techniques [17] are that the BDD memory

explosion problem can be averted and that compactions exceeding the finite bound of BMC

approaches may be applied. Our technique appears to share many of the advantages of the


sequential SAT approach proposed in [78]. The main difference is that ours relies on reachability

analysis and pre-image computation that results in large state sets. We develop a novel data

structure to quickly determine possible state containment relationships, or which states are

contained within the found state sets, to reduce the trace lengths.

More specifically, the contributions in this Chapter are the following:

• A trace compaction technique based purely on pre-image computation and reachability

analysis using an all-solution SAT solver.

• A set of containment rules that help draw relationships between existing states and states

found through pre-image computation which may result in shorter traces.

• A state selection procedure within the reachability analysis engine and a set of heuristics

that improve the performance of the overall approach in practice.

• A novel data structure for storing visited states that allows for quick identification of state

containment relationships.

This chapter is organized as follows. In the next section, some background information

is provided on finite state machines, pre-image computation, and reachability analysis. Sec-

tion 6.3 presents the proposed trace compaction approach and discusses its central procedures.

Section 6.4 introduces a novel data structure critical for the efficient performance of the pro-

posed approach. Sections 6.5 and 6.6 demonstrate the experimental results and conclude the

chapter, respectively.

6.2 Preliminaries

In this section we provide some background on finite state machines, traces, image and pre-

image computation, and reachability analysis.

6.2.1 Finite State Machines

A sequential digital circuit can be modeled by a Finite State Machine (FSM) represented by a

6-tuple M := (Q,Σ, ∆, δ, λ, q0) where Q is the finite set of states, Σ are ∆ are the input and


output alphabets respectively, δ : Q×Σ → Q is the state transition function, λ : Q×Σ → ∆ is

the output function, and q0 is the initial state [53]. Figure 6.1 illustrates a simple FSM where

the states are represented by nodes and the transitions are represented by edges.

q0

aa

a

a

67

89

10aq4

a1 a3

a4

a5

a2q1 q2

q6

q5

q3

Figure 6.1: Finite State Machine with 7 states

A trace of length k for an FSM is an input sequence < a1, a1, ..., ak > that leads the FSM

through a sequence of states < q0, q1, ..., qk−1, qk >. Note that some states may be repeated in

the state sequence. Figure 6.2 represents one possible trace for the FSM of Figure 6.1.

6

a a a a2 3 4 5a1qq0 q1 q2 q3 q1

Figure 6.2: A sample trace for the above FSM

6.2.2 Image and Pre-image Computation

Given a sequential circuit with current state variables V and next state variables V ′, a set

of current states and a set of next states are labeled by Q(V ) and Q(V ′) respectively. The

transition relation from a set of states Q(V ) to Q(V ′), denoted by T (Q(V ), Q(V ′)), is true for

each pair of Q(V ) and Q(V ′) if δ(Q(V )) = Q(V ′) for a set of input assignments [53]. Given the

above, the image and pre-image of a circuit can be defined as follows.

image: Q(V ′) = ∃V.(T (Q(V ), Q(V ′)) ∧ Q(V ))

pre-image: Q(V ) = ∃V ′.(T (Q(V ), Q(V ′)) ∧ Q(V ′))

Intuitively, the image of a state qi is all the states that can be reached from qi under all


possible input combinations in a single clock cycle. Similarly, the pre-image of qi comprises of

all the states that can lead to qi under all possible input combinations in one clock cycle. In

the FSM of Figure 6.1, the image of state q1 is {q2, q6} while its pre-image is {q0, q3}.

Although the image and pre-image of circuits are traditionally computed using BDDs [53],

some techniques based on all-solution Boolean Satisfiability (SAT) solvers can also be used [51,

59, 71, 82]. All-solution SAT solvers can compute the pre-image set Q(V ) by constraining the

circuit CNF to Q(V ′) and iteratively finding all the solutions that satisfy the CNF in terms

of the current state variables V [82]. Recent work on SAT-based Unbounded Model Checking

(UMC) and pre-image computation techniques have demonstrated considerable advancements

[51, 59, 71, 82].

In this work, we are mainly concerned with SAT-based pre-image computation. Since this

technique finds states one at a time, we use the term pre-image loosely to also refer to a single

state qjthat belongs to the pre-image of qi. Furthermore, we use the term state to refer to a state

cube, which is a state encoding that may contain unassigned or don’t care variables. As such, a

state may be a superset (cover) of other states. For instance, the state cube {v1, v2, v3} =1X1

covers the states {v1, v2, v3}=101 and {v1, v2, v3}=111. For brevity, in the remaining of this

paper we drop the variable names (i.e. v1, v2, v2) when describing state values.

6.2.3 Reachability Analysis

Reachability analysis is the process of determining whether a state qk is reachable from another

state q0. In the realm of UMC, reachability analysis can be used to check CTL properties of

type EFqk where qk is a bad state and q0 is a legal or initial state [53].

Intuitively, reachability analysis traverses the state space backwards from state qk until a

state q0 is found or a fix-point, where no new states are found, is reached [53]. Pre-image

computation is a central procedure of reachability analysis as it performs the single backward

steps. The manner in which the state space is traversed depends on which of the visited states

is selected for each pre-image computation step. If the visited states are stored in a stack-like

data structure, a depth-first traversal is performed, while a queue-like data structure results in

a breadth-first traversal. Figure 6.3 illustrates a breadth-first reachability analysis process that


initial state found

0q

k

pre-image 3

pre-image 1

pre-image 2

q

pre-image i

Figure 6.3: Illustration of reachability analysis

eventually finds the initial state q0. In this figure, the black nodes represent states while each

cone represents a set of states found by one pre-image computation step.

6.3 Proposed Trace Compaction Approach

In this section we present our proposed trace length compaction approach. First we introduce

the central concept followed by details of the state selection procedure and the all-solution SAT

solver.

6.3.1 Reachability Based Trace Compaction

A trace can be represented by a directed graph G = (N, E) where the nodes N represent states

and the edges E represent transitions between states. An edge from state qi to qj denotes that

qi belongs to the pre-image of qj and qj belongs to the image of qi. Our objective is to reduce

the length of the path from the initial state q0 to the final state qk by applying pre-image

computation and reachability analysis techniques.

Our proposed approach performs reachability analysis on all the states belonging to the

original trace. The manner in which states are selected for reachability analysis is described in

Section 6.3.3. All the states (or state cubes) found by the pre-image computation steps of the

reachability engine are added to the graph G. Graph G is updated with edges denoting that

each newly found states qi is a pre-image of some state qj , selected for pre-image computation.


2

5q

q

4qq0 q1 q2 q3

Figure 6.4: Updating the graph G with new nodes and edges

When states found by pre-image computation already exist in the graph G, extra edges may

be drawn in G to illustrate new legal transitions. These transitions may provide a shorter path

(or short-cut) from the initial state to the final state thus reducing the overall trace length.

For example consider the situation described in Figure 6.4 where the original trace is shown as

the sequence < q0, q1, q2, q3, q4 > and the dashed nodes are states found through reachability

analysis. Since q2 is found as a pre-image of q4, and q1 is the pre-image of q2 in the original

trace, a new edge shown as dashed line can be drawn directly from the original (non-dashed)

q2 to q4 and the dashed q2 can be removed. The overall result is a shorter path from q0 to q4

which skips node q3.

As motivated by the above example, finding state equivalences in the graph G can lead to

more “short-cuts” which can reduce the overall trace size. Along with the state equivalence

relation discussed, there are other state containment relationships that can lead to further

short-cuts in the graph. The following rules determine how the graph G is updated after each

pre-images computation step.

Consider state qi found as a pre-image of state qi+1, and the sequence < qj−1, qj , qj+1 >

existing in the graph G.

• Rule 1. If qi = qj : State qi is not added to G, but an edge is drawn from qj to qi+1.

• Rule 2. If qi ⊃ qj : State qi is added to G, an edge is drawn from qi to qi+1, and another

edge is drawn from qj to qi+1.

• Rule 3. If qi ⊂ qj : State qi is added to G, an edge is drawn from qi to qi+1, another edge

is drawn from qj−1 to qi, and another edge is drawn from qi to qj+1.


The correctness of rule 1 is evident as the images of equivalent states are also equivalent.

Rule 2 can be explained by expanding the state cube qi, into two components qi = {qj}⋃{qi −

qj}. From here we use the fact that any image of qi is also an image of qj . Similarly, rule 3 can

be explained by expanding qj into two components qj = {qi}⋃{qj − qi}.

1X1=

j-1 j+1

i+1

jqqq

qi

q

...

...

...

=101 1X1

101=

j-1 j+1

i+1

jqqq

qi

q

...

...

...

=

(a) (b)

Figure 6.5: Illustrating rules 2 and 3

The following example helps clarify rules 2 and 3. Consider state qi found as a pre-image

of state qi+1, and the sequence < qj−1, qj , qj+1 >, where state qi =1X1 and the state qj = 101.

By rule 2, an edge is first drawn from qi to qi+1 to indicate that qi is a pre-image of state qi+1.

Since 1X1 ⊃ 101 and qi+1 is an image of qi =1X1= {101}⋃{111}, then qi+1 must also be an

image of qj = 101. This scenario is illustrated in Figure 6.5 (a) with the new edges drawn as

dashed lines. Similarly, by rule 3 an edge is first drawn to indicate that qi is a pre-image of

state qi+1. Since state qi =101 is a subset of state qj =1X1= {101}⋃{111}, then the states

qj−1 and qj+1 must be a pre-image and an image of qi also, respectively. The three edges added

in this scenario are drawn as dashed lines in Figure 6.5 (b).

Our overall trace compaction technique using reachability analysis is shown in Figure 6.6.

Lines 1-7 set up the problem, build the initial graph G and determine the initial trace length.

The remaining lines perform reachability analysis by selecting a state for pre-image computation

(line 10), computing the pre-images (line 12), and applying the state containment rules (line

14). The reachability analysis is terminated after all states have been selected for pre-image

computation or after a maximum, max, number of steps have been performed determined by

the counter.


6.3.2 Creating More Short-cuts

As discussed in the previous section, the containment rules are critical for creating short-

cuts in the graph G. To increase the likelihood of applying these rules, the reachability engine is

slightly modified from its typical UMC application. Traditionally in UMC, reachability engines

focus on finding only new states and “block” previously visited states [71]. This allows them

to quickly identify when a fixed-point is reached, or when all legal states are visited [51]. In

contrast, this work encourages finding previous states or states that cover or are covered by

others. These containment relationships allow us to draw additional edges between nodes and

increasing the likelihood of reducing the trace. It should be noted that precautions are taken

to avoid repeatedly visiting the same set of states.

A second technique used to increase the likelihood of applying the containment rules is

to populate the graph with more states than those provided in the original trace. Since the

original trace only has as many states as its trace length, there may not be enough unique states

to create many short-cuts. We propose populating the graph initially by computing a single

pre-image for the states in the original trace. This approach allows us to quickly add state

cubes to the graph which leads to more applications of the containment rules. The practical

advantage of this technique is highlighted in the experiments of Section 6.5.

6.3.3 State Selection Procedure

During reachability analysis, which state is selected for pre-image computation determines the

manner in which the state space is traversed. For instance, if the most recently visited (found)

state is always selected, then the state space is traversed in a depth-first manner. Here, we

develop state selection criteria that help guide the reachability engine towards finding short-

cuts from the initial state to the final state. It should be noted that these criteria are heuristics

which may not always be advantageous.

The first criterion is to select a candidate state from the set of visited states with the

smallest hamming distance to the initial state q0. The hamming distance between two states

is the number of state variables with different values (0 or 1). For states with don’t cares (X),


1: G = ∅2: V isited = ∅3: counter = 04: for all (states qi between q0 to qk (inclusive))

do

5: V isited.add(qi)6: G = add to graph(qi)7: end for

8: length = BFS(G, qk, q0)9: while (counter ≤ max && !V isited.empty())

do

10: qj = select state(V isited)11: V isited = V isited− qj

12: PreImages = pre-image(qj)13: for all (states qi ∈ PreImages) do

14: apply rules1 2 3(G, qi, qj)15: end for

16: V isited = V isited⋃

PreImages17: counter = counter + 118: length = BFS(G, qk, q0)19: Print(Trace is of size length)20: end while

21: return length

Figure 6.6: Trace compaction procedure using reachability analysis

every X matches both the 0 and 1 value. For instance, if states {1100, 1011, 110X, XX01} are

visited and q0 = 0000, then state XX01 is selected since it has a hamming distance of 1 with

respect to q0. The intuition behind the above criteria is that states with a smaller hamming

distance to q0 require less state variables to change to reach q0 as a pre-image. Therefore, the

likelihood of finding q0 at the next step may be higher.

A second factor that influences the state selection procedure is the path length from a

candidate state to the last state qk. If this length is greater than 50% of the current shortest

path from q0 to qk then the state is not considered for selection. This criteria encourages

finding many pre-images near the end of the trace (closer to qk) and less closer to the initial

state. Together, both criteria increase the probability of creating large short-cuts between states

at the two ends of the original trace.

6.3.4 All-Solution SAT Solver

The reachability engine is highly dependent on the performance of the pre-image computation

engine, which is based on an all-solution SAT solver. This SAT solver uses circuit don’t cares


to determine whether variables may remain unassigned while satisfying the problem [82, 102].

Since the don’t cares are propagated backwards through a gate (from output to input) they

are ideal for pre-image computation where current state variables V can be viewed as pseudo

inputs to the circuit. The all-solution SAT solver contains many solution reduction techniques to

ensure that small solutions are returned in an efficient manner [51, 71, 82]. For our application,

achieving small state cubes is critical to traversing the state space efficiently.

Each pre-image computation step corresponds to a call to the all-solution SAT solver. Since

it may not be practical to find all of the pre-image states due to the exponential nature of the

problem, the all-solution SAT solver is also equipped with a limit t. If all the pre-image state

cubes are not found in a time and memory efficient manner, the all-solution SAT solver will

return the first t state cubes it finds. This allows us to perform reachability analysis by finding

partial pre-images.

6.4 Storing Visited States

The success of the reachability analysis approach described in Section 6.3 depends on the ability

to quickly apply the rules of Section 6.3.1. More specifically, the situations where a newly found

state qi 1) is equal to existing states, 2) is a superset of existing states, or 3) is a subset of

existing states must be rapidly identified. In this section we introduce a data structure that

stores all the states belonging to G while identifying the state containment relationships quickly.

Note that this data structure is not only viable for trace compaction, but can also be used for

reachability analysis within a UMC framework [51, 59, 71].

6.4.1 Determining State Containment Relationships

The data structure described here is composed of two components 1) a binary tree T and 2)

a hash table. The binary tree is used to detect the state containment relationships, while the

hash table is used to locate the exact state.

The state containment relationship depends on the number of don’t cares in each state. A

state with more don’t cares may cover one with fewer, while the converse is not true irrespective


0

0

1

1

1

1

1

1

1

1

2

3

4

5

2

3 3

4

5

Hash table

1101X

XX001X00X1

001X1

X11XX

Figure 6.7: Illustrating state storage data structure

of the actual position of the don’t cares. To take advantage of the above, we allocate an ordered

cube for each state. The ordered cube is defined as the state value with all the zeros in the most

significant positions, followed by all ones, followed by the don’t cares (X) in the least significant

positions. For example, five states and their corresponding ordered cubes are shown below.

states 1101X 001X1 XX001 X00X1 X11XX

ordered cube 0111X 0011X 001XX 001XX 11XXX

When states are added to the graph G, they are also stored according to their ordered cube

in the binary tree T . Each node of a given depth in the binary tree corresponds to a position in

the ordered cube. The top-most node at depth zero of the tree represents the most significant

position, the nodes at depth 1 represent the second most significant position, the nodes at depth

2 represent the third most significant position, etc. The left (right) edge of a node denotes a

zero (one) in the ordered cube at the position corresponding to the parent node. There are

no edges corresponding to a don’t care in the ordered cube. By scanning over the values of

an ordered cube from the most significant to the least significant, the binary tree is traversed

for that cube. Traversal ends when the ordered cube is fully scanned or when a don’t care is

encountered. By the end of the traversal, the final visited node points to a hash table where

the state value is stored.

The hash table contains all states that map to the same ordered cube. For instance, at the

node corresponding to the ordered cube 001XX in Figure 6.7, there can be two unique state


cubes XX001 and X00X1. Figure 6.7 illustrates how the states 1101X, 001X1, XX001, X00X1,

X11XX are stored in the described data structure.

Given a state qi, this data structure can efficiently determine whether qi already exists in

G, whether qi is a subset of other states in G, and whether qi is a superset of other states in

G. For all three tasks, first the node ni corresponding to ordered cube of qi must be located in

the binary tree. If qi exists in the hash table pointed by node ni, then qi already exists in G.

To find whether qi is a proper subset of other states, all the nodes with at least as many

don’t cares (X) as ni have to be visited. At each node, the states within the hash tables must

be tested to determine if qi is a subset. Within the tree T , the nodes with at least as many

don’t cares as ni are found inside an r+1 by s+1 rectangle, where r is the number of zeros and

s is the number of ones in qi. Therefore, there are (r + 1) × (s + 1) nodes that can potentially

contain supersets of qi (including node ni). These nodes are illustrated in the dashed rectangle

above node ni in Figure 6.8.

Similarly, to find whether qi is a proper superset of other states, all the nodes with at least

as many zeros and ones must be visited and the states within the hash tables must be tested

to determine if qi is a superset. Within the tree T , these nodes are found inside an isosceles

triangle with equivalent sides n − r − s. Therefore, there are (n−r−s)(n−r−s+1)2 nodes that can

potentially be subsets of qi (including node ni). These nodes are illustrated in the dashed

triangle under the node ni in Figure 6.8.

n-r-sni

r

n-r-s

s

Figure 6.8: Finding supersets and subsets in the tree T


1: Covers = ∅2: ordered cube = Order(qi)3: ni = Get tree node(ordered cube)4: Supset =get rectangle(ni)5: for all (nodes nj in Supset) do

6: for all (states qj in hash table of nj) do

7: if (qj ⊇ qi) then

8: Covers = Covers⋃

qj

9: end if

10: end for

11: end for

12: return Covers

Figure 6.9: Determine the states that are supersets of this state

As demonstrated through Figure 6.8, only the white nodes must be considered when search-

ing for subsets and supersets. Therefore, the number of comparisons required may be only a

fraction of the total number of existing states. In practice, this data structure is found to be

very efficient since the tree T is often not fully populated and the number of items in each hash

table is relatively small.

The procedure for finding the supersets (covers) of a given state qi is presented in Figure 6.9.

Lines 2-3 generate the ordered cube and find its location in the tree T . Line 4 gets all the

potential superset nodes by finding the nodes contained in the rectangle. The remaining lines

iterate through these nodes and test the states inside the hash tables to determine whether

they are supersets of qi. Note that testing whether a particular node is a superset or a subset of

another is a simple comparison procedure where the states must be identical over all positions

except where the superset is a don’t care. A procedure similar to that of Figure 6.9 is used to

find the subsets of qi where the get rectangle procedure is replace with get triangle as described

previously.

6.5 Experiments

In this section we demonstrate the effectiveness and performance benefits of the proposed trace

compaction approach. All experiments are conducted on a Sun Blade 1000 with a 750MHz Sparc

processor and 2.5GB of memory. Traces of length 50, 100, and 1000 are obtained via random

simulation for the circuits in the ISCAS’89 and ITC’99 benchmark suites. The reachability

analysis engine is developed using the all-solutions SAT solver of [82] which is a circuit variant


of zChaff [69] and Grasp [73]. To evaluate the overall proposed approach we limit the number

of stored states to at most 10,000 state cubes and do not use an explicit timeout. Since the

compaction techniques of previous works [20, 78, 90] are not publicly available and due to the

fact that the assertions and errors used are unknown, we cannot directly or indirectly compare

with them.

2000

2500

3000

4000

4500

Run

time

(s)

3500

BFS DFS random proposedState selection methods

Figure 6.10: Comparison of state selection methods

We first evaluate the effectiveness of the state selection procedure described in Section 6.3.3.

We compare this heuristic against three other selection approaches, Depth-First Search (DFS),

Breadth-First Search (BFS), and random selection. The above techniques are used to perform

reachability analysis from a random state to the initial state given a timeout of 200 seconds.

The run-times over all the benchmarks are collected and presented in Figure 6.10. Both the

DFS and BFS methods result in run-times of over 4000 seconds, while the random method

fares better at over 3500 seconds. The proposed state selection strategy based on the smallest

hamming distance relative to the initial state and the position of the state in the graph G results

in run-times of just over 3000 seconds. This performance demonstrates that the proposed state

selection heuristics is an efficient overall reachability analysis procedure.

Next, we demonstrate the effectiveness of the overall proposed trace compaction approach.

Table 6.1 illustrates the results of the experiments on all ISCAS’89 and ITC’99 circuits for

traces of length 50, 100 and 1000. The first column shows the circuit names while the remaining

columns are organized into three sections based on their original trace length. The first column

of each section labeled org describes the original length of each trace (50, 100, or 1000). The

second column of each section labeled pre describes the length of the traces after performing the

single step pre-image process described in Section 6.3.2. We chose to find single step pre-images


circuits org pre reach cpu pre cpu reach org pre reach cpu pre cpu reach org pre reach cpu pre cpu reach

s208.1 50 25 25 0.00 0.56 100 51 51 0.07 0.60 1000 244 244 0.08 9.26

s298 50 1 1 0.00 0.00 100 3 1 0.59 0.86 1000 1 1 0.34 0.01

s344 50 33 1 0.00 0.00 100 55 1 0.31 0.00 1000 10 5 0.42 0.08

s349 50 33 1 0.00 0.00 100 55 1 0.32 0.00 1000 10 5 0.39 0.08

s382 50 3 1 0.00 0.17 100 4 2 0.75 0.00 1000 1 1 0.89 0.00

s386 50 1 1 0.00 0.00 100 2 2 0.09 0.00 1000 2 2 0.06 0.00

s400 50 3 1 0.00 0.01 100 2 1 0.69 0.01 1000 2 1 0.74 0.05

s420.1 50 21 21 0.01 1.20 100 44 44 0.13 0.97 1000 505 505 0.14 25.85

s444 50 2 1 0.01 0.01 100 3 1 0.98 0.93 1000 1 1 0.67 0.01

s510 50 24 24 0.00 0.87 100 10 10 0.13 0.66 1000 25 25 0.12 0.56

s526 50 2 1 0.00 0.03 100 3 1 1.27 0.86 1000 1 1 1.09 0.03

s526n 50 2 1 0.00 0.03 100 3 1 1.26 0.86 1000 1 1 1.17 0.02

s641 50 3 3 0.00 1.65 100 4 4 1.81 2.10 1000 2 2 1.72 5.86

s713 50 3 3 0.00 1.65 100 4 4 1.80 2.01 1000 2 2 1.76 2.88

s820 50 1 1 0.00 0.00 100 1 1 0.00 0.00 1000 1 1 0.38 0.00

s832 50 1 1 0.00 0.00 100 1 1 0.00 0.00 1000 1 1 0.4 0.00

s838.1 50 26 26 0.00 1.87 100 45 45 0.26 2.07 1000 510 510 0.27 48.48

s953 50 6 5 0.00 1.38 100 1 1 2.52 0.00 1000 1 1 3.25 0.01

s1196 50 8 1 0.00 0.05 100 14 1 0.89 0.12 1000 5 1 1.11 0.03

s1238 50 8 1 0.01 0.05 100 14 1 0.84 0.11 1000 5 1 0.96 0.02

s1423 50 50 2 0.01 3.41 100 57 2 6.19 3.55 1000 15 3 6.24 67.61

s5378 50 50 50 0.04 0.89 100 100 100 23.76 1.03 1000 1000 1000 26.18 5.86

s9234.1 50 50 50 0.04 22.67 100 100 100 50.26 1.76 1000 1000 1000 49.89 11.55

s9234 50 34 34 0.02 1.67 100 36 36 46.99 1.66 1000 35 35 47.41 10.76

s13207.1 50 50 50 0.28 3.52 100 100 100 96.76 4.20 1000 1000 1000 105.92 7.61

s13207 50 50 50 0.23 3.29 100 100 100 91.57 4.17 1000 1000 1000 98.79 7.74

s15850.1 50 50 50 0.12 5.82 100 100 100 145.67 87.18 1000 1000 1000 140.31 9.01

s15850 50 50 50 0.07 3.45 100 100 100 96.18 4.19 1000 1000 1000 222.94 8.09

s38417 50 50 50 1.07 40.58 100 100 100 311.05 154.30 1000 1000 1000 340.83 25.74

s38584.1 50 50 50 1.27 11.83 100 100 100 336.97 12.37 1000 1000 1000 375.70 25.68

s38584 50 50 50 1.26 59.15 100 100 100 315.11 185.30 1000 1000 1000 344.44 23.85

b01 50 6 2 0.09 0.00 100 4 4 0.10 0.04 1000 4 4 0.9 0.04

b02 50 2 2 0.04 0.00 100 4 4 0.04 0.01 1000 4 4 0.4 0.01

b03 50 14 2 1.17 0.07 100 26 2 1.20 0.05 1000 8 8 1.32 13.54

b04 50 50 50 6.57 3.79 100 100 100 6.26 4.54 1000 1000 1000 6.89 27.88

b06 50 3 1 0.65 0.00 100 3 3 0.63 0.04 1000 2 2 0.62 0.03

b07 50 43 43 0.35 1.81 100 51 51 0.34 1.70 1000 56 56 0.28 14.56

b08 50 43 7 0.11 1.17 100 92 2 0.16 0.00 1000 329 5 0.16 0.02

b09 50 50 17 0.16 0.96 100 97 97 0.17 1.56 1000 82 82 0.18 18.70

b10 50 22 22 0.36 1.19 100 45 21 0.31 1.56 1000 32 32 0.59 9.56

b11 50 35 25 1.44 3.06 100 98 88 2.50 4.07 1000 550 550 1.92 27.68

b12 50 14 14 8.03 5.97 100 20 20 7.51 2.50 1000 36 36 7.89 34.85

b13 50 45 45 2.10 2.22 100 99 98 2.05 2.80 1000 1000 1000 2.57 19.54

b14 50 50 50 42.48 1.84 100 100 100 47.68 24.18 1000 1000 1000 52.92 3.93

b15 50 49 49 76.63 6.19 100 100 100 56.65 43.78 1000 87 87 52.11 230.41

Table 6.1: Results of proposed trace length compaction for traces of length 50, 100, 1000.

for no more than 50 states to achieve a balance between the number of pre-images found and

the time required to find them. The third column of each section labeled reach, presents the

length of the traces after applying the proposed reachability analysis method. As described

in section 6.3.2, it is most beneficial to first find the single step pre-images followed by the


orginal size 50 orginal size 100 orginal size 1000

approach avg. reduced affected reduced avg. reduced affected reduced avg. reduced affected reducedpre 10.08 X 70 % 13.77 X 16.88 X 72 % 22.66 X 266.35 X 71 % 362.84 Xreach 3.81 X 37 % 8.54 X 6.10 X 35 % 15.36 X 2.77 X 15 % 12.40 Xcombined 19.67 X 74 % 25.72 X 36.21 X 72 % 49.01 X 327.76 X 72 % 446.59 X

Table 6.2: Summary of the results for the proposed trace length compaction approach

reachability analysis (reach) method. The fourth and fifth columns of each section, labeled cpu

pre and cpu reach respectively, present the run-times in seconds associated to the pre and reach

techniques.

Table 6.1 shows that the pre-image computation techniques help reduce the traces consid-

erably. For many circuits, the original trace length is first reduced greatly by the single step

pre-image (pre) technique and further reduced by the reachability analysis (reach). For exam-

ple, the trace for circuit s344 is first reduced from 50 to 33 using pre, and then again from 33

to 1 using reach.

Analyzing the results of Table 6.1, we notice that many traces are reduced to having a

single clock cycle (length of 1) or a very small trace size after applying reachability analysis.

This result can be partially attributed to the state selection heuristics of Section 6.3.3 and

the performance improvement techniques of Section 6.3.2. These techniques can increase the

number of “short-cuts” created through the graph G and likelihood that they will lead to the

initial state.

Table 6.2 summarizes the results in Table 6.1 by providing the average length compactions

(reductions) achieved by the different components of the proposed approach for traces of size 50,

100, and 1000. Similar to Table 6.1, the summaries are provided for each original trace length

separately. Column one presents the name of the compaction method: single step pre-image

computation (pre), reachability analysis (reach), or combined. For each trace length, the overall

average reduction is presented under the label avg. reduced. This field is calculated by adding

the reduction in size over all circuits divided over the number of circuits. Since not all circuit

traces are reduced by the proposed method, this number may not provide a good representation

of the average factor of reduction achieved. Instead, the columns labeled affected and reduced

show the percentage of traces that are affected by each approach and the amount by which


they are reduced, respectively. For example, for traces of length 50, the proposed approaches

separately achieve 10.08 times and 3.81 times reductions while the combined approach reaches

19.67 times reductions. Furthermore, approximately 70% of the circuits are affected by the pre

techniques which results in an average reduction of 13.77 times. Similarly, the reach technique

and the combined approach affect 37% and 74% of traces for a reduction of 8.45 times and

25.72 times, respectively.

The experimental results demonstrate that not only is the proposed approach effective for

reducing traces, but it is also very efficient. For the majority of circuits in Table 6.1, compacted

traces are found within a few minutes. This performance reaffirms the practicality of the data

structure introduced in Section 6.4. The memory requirements of the overall approach are also

manageable since memory usage never exceeds 300MB when storing up to 10,000 state cubes.

The ability to quickly reduce traces in a memory efficient manner is crucial for making this

approach viable in real-life debugging environments.

6.6 Summary

This work proposed a novel trace reduction technique using SAT-based reachability analysis and

a set of state containment relationships. The components of the reachability analysis engine are

fine-tuned to increase the likelihood of generating short-cuts in the original trace. Furthermore,

a novel data structure is presented which stores visited states such that the state containment

relationships can be quickly applied. Experiments demonstrate the effectiveness of the proposed

techniques as approximately 75% of the traces are reduced by one or two orders of magnitude.

Chapter 7

Conclusion and Future Work

7.1 Summary of Contributions

As VLSI designs continue to increase in size and complexity, the verification and debugging

bottlenecks become more prominent. Since debugging is almost exclusively performed manually

in the industry today, the debugging burden will continue to increase as the complexity of

designs increase. To alleviate this overwhelming manual effort, automated debugging solutions

are required.

Research in automated debugging has shown great promise since the work in SAT-based

methodologies [92] was introduced almost six years ago. These powerful techniques, based on

formal technology, outperform traditional BDD and simulation-based diagnosis approaches by

orders of magnitude. Armed with such impressive achievements, researchers today are in a

quest for automated debugging solutions to industrial problems.

To achieve adoption by the VLSI industry, current debugging techniques must first overcome

the complexities introduced by large designs and their long error traces. These factors influence

the run-time performance and memory requirement of debuggers, which can be excessive for

real-life problems. For example, a relatively small design block of 100 thousand gates with a

corresponding error trace of 1000 clock cycles takes over 32GB of memory and may take weeks

to solve [81]. Practically, requiring over 32GB of memory may not be possible and requiring

more than a few hours dramatically reduces the value provided by the debugger.

118

Chapter 7. Conclusion and Future Work 119

Fortunately, the field of automated debugging using formal techniques is still in its infancy

and much improvements are possible. This dissertation presents such contributions that aim to

bridge the gap between current industrial demands and capabilities of debugging technologies.

• In Chapter 3, a debugging methodology using abstraction and refinement is introduced.

This methodology allows existing debuggers to cope with the complexity of large designs

by systematically partitioning the problem into successively larger and harder problems.

Abstraction is first applied by removing state elements or complex components from the

design under consideration. Next a debugger locates error sources within the abstracted

design. Under certain conditions, error sources can be missed because certain components

are removed due to abstraction. In this case, refinement re-introduces the required ab-

stracted elements back into the circuit. The iterative nature of abstraction and refinement

allows for large problems to be solved incrementally using less memory and faster run-

time. Experiments demonstrate that abstraction and refinement can improve debugging

performance by as much as two orders of magnitude while reducing memory requirements

to 10%, compared to a state-of-the-art debugger. The abstraction and refinement based

debugging methodology is published in [86] and [81].

• In Chapter 4, Bounded Model Debugging (BMD) is introduced to reduce the impact of

long error traces on the debugging problem. BMD is a methodology based on the insight

that errors are typically excited in close temporal proximity to the failure observation

point. Theory is developed and confirmed through statistical and empirical means. The

BMD methodology considers a subset of the error trace in order to debug the problem.

The debugging problem is enhanced with mechanisms to determine whether the solutions

are complete or whether some error sources may be missing. Such situations result in

increasing the length of the error trace subset under consideration and performing a

subsequent debugging process. The BMD methodology proposed is complete as it will

locate all error locations. Empirical evidence demonstrates the power of BMD as without

it only 34% of errors are found, while with BMD 93% of errors are found. Furthermore,

run-time improvements of up to two orders of magnitude are achieved with BMD.


• In Chapter 5, the first debugging formulation based on maximum satisfiability (max-

sat) is introduced. Here debugging is formulated as an unsatisfiable problem where the

erroneous circuit is incorrectly expected to implement the correct behaviour. A max-

sat solver reasons about the cause of the unsatisfiability and identifies clauses whose

removal make the problem become satisfiable. These clauses in turn correspond to error

sources in the erroneous circuit. Apart from providing an alternative formal debugging

formulation, max-sat solvers are effective at solving over-approximations of the debugging

problem. More specifically, they can quickly identify sets of unsatisfiable clauses. Using

this strength, a two step debugging framework is developed where a max-sat solver finds

coarse-grained solutions which are refined using a fine-grained state-of-the-art SAT-based

debugger. Experiments demonstrate that the two step approach provides an average

performance improvement of approximately 200 times. This work is published in [85].

• In Chapter 6, techniques are developed to reduce the length of the error traces independent

of the debugging approach used. Trace reduction techniques address one of the major

scaling challenges faced by debuggers today, the large size of error traces. Since the

number of clock cycles contained in a trace determines the size of the debugging problem,

reducing the error trace length can require much less memory by the debugger. The trace

reduction techniques presented in Chapter 6 use a novel reachability analysis that can

find pre-image states in a single iteration. In turn the pre-image states are used to deduce

relationships between visited states to establish a shorter trace. Along with efficient data

structures, the proposed algorithm is very effective as 75% of problems are reduced by

one to two orders of magnitude. This work is published in part in [84] and in [83].

7.2 Future Work

Much of the work presented in this dissertation addresses major challenges in automated debug-

ging through novel techniques. In general this thesis provides the basis, theory and empirical

evidence confirming the effectiveness of its contributions. More specifically, the work on abstrac-

tion and refinement, bounded model debugging and max-sat debugging are at their infancy and


there is promise of considerable improvements to be developed in the near future. In the next

section future work related to the contributions of this thesis are presented while in Section 7.2.2

discusses future research directions in automated debugging.

7.2.1 Extension of Contributions

Experiments demonstrate that abstraction and refinement may be one of the most effective

divide-and-conquer approaches for automated debugging. Since error excitation and observa-

tion points can be present across module boundaries or across countless circuit elements, basic

problem partitioning traditionally used in equivalence checking [30], model checking [15], syn-

thesis [89], and other CAD techniques cannot be applied. Some important remaining questions

pertaining to abstraction and refinement are how much abstraction to perform and which com-

ponents to abstract. We addressed one part of this question for function-based abstraction in

Chapter 3, however, we did not investigate this topic for state-based abstraction. For example

which state elements to abstract and what level of abstraction is optimal, remain open ques-

tions. Another direction of future research is the type of abstractions to perform apart from

state-based and function-based. For example, in model checking there exist many abstraction

techniques such as predicate abstraction, existential abstraction, universal abstraction, and

more [24, 40, 49] each with their strengths and weaknesses. Similarly, effective abstraction

techniques can be developed for debugging based on the circuit structure, its datapath and its

state machine. Such approaches should be more powerful than the basic techniques presented

in this thesis.

Bounded model debugging showed great promise in Chapter 4 as a systemic way of coping

with very long error traces. Experiments and statistical analysis demonstrate that for the

majority of problems, a short error trace suffices to find the error source. However, there

are particular situations where a short trace suffix is not adequate. For example, consider an

erroneous value written into memory but not read from memory until thousands of clock cycles

later. BMD cannot find the error source under this scenario until a trace including both the

writing and reading events is considered. Although, trace reduction may help with the above

scenario, improvements can be made to BMD to diagnose a trace window instead of a suffix.


One approach is to partition the trace in windows that can each be solved independently.

However, each window must also contain constraints corresponding to the expected/golden

values of the observed error signals. One challenge with the above proposal is that the size of

the constraints containing the correct signal values is exponential in nature. As clock cycles

prior to the observed failure are analyzed the constraint size can grow to be much larger than

the original problem. Efficient pre-image analysis techniques and approximations may be able

to reduce the severity of the exponential increase in size.

The max-sat debugging formulation proposed in this thesis is a natural alternative to SAT-

based debugging. Experiments demonstrate that max-sat fairs well for the over-approximate

debugging problem, while SAT is more efficient for solving the exact problem. One reason for the

difference in performance may be due to the relative immaturity of max-sat solvers. While SAT

solving algorithms have been actively improved over the last decade [27, 28, 70, 73, 74], relatively

little attention has been paid to improving max-sat algorithms. One reason is the lack of

industrial interest in max-sat which results in the scarceness of industrial max-sat applications.

The debugging problems presented in this thesis are the first industrial applications of max-

sat from the CAD for VLSI community and were included in the third max-sat competition

in 2008 [4]. With new interest and problems developed using max-sat, improvements will be

made to the max-sat solvers. Following the trend set by SAT over the past decade, max-sat

algorithms may experience order of magnitude improvements within the next few years. The

future directions of research related to debugging exist both in formulating different types of

max-sat problems as well as developing efficient max-sat algorithms.

7.2.2 Future Directions

Formal techniques have given new life to automated debugging in the past decade [92]. Whereas

industrial applications were few and limited ten years ago, there is now promise for broad

applications across the VLSI design world. To achieve such as an ambitious goal further research

is needed in the following areas:

• Encompassing debugging techniques. Incorporating existing individually effective debug-

ging techniques into a efficient and robust debugging tool.


• High level reasoning engines. Migrating from Boolean level reasoning engines such as SAT

and max-sat solvers to high level reasoning engines such as SMT solvers.

• Verification environment debugging. Broadening the scope of debugging to include local-

izing errors in the stimulus generators, checkers and assertions.

Firstly, it is clear that techniques presented in Chapters 3 and 4 as well as others must be

combined to create in an effective and robust debugging tool. One challenge at the integration

level is how to combine techniques such as abstraction and refinement, bounded model debug-

ging, hierarchical debugging [3], and memory debugging [52]. An all encompassing debugging

framework is needed to intelligently determine which techniques to apply and in what manner

to apply them to different problems. Since many of the individual techniques are iterative

by nature, heuristics are needed to identify appropriate situations to dispatch each. This re-

search mirrors advances made in the synthesis domain where effective individual optimization

techniques are combined to generate powerful and robust synthesis tools such as SIS [89], MV-

SIS [37] and Synopsys design complier [7]. Once an all encompassing debugging framework is

developed, more targeted approaches can be created for specific corner cases.

A second direction for future research in automated debugging is the migration from Boolean

level to higher level reasoning engines. Most advanced debugging techniques rely on SAT, QBF,

and max-sat solvers, where high level problems are first converted to Boolean level problems

and then solved. The above approach is also referred to as bit-blasting. Since most problems

come from the RTL or higher level models, there is bus, module, instantiation, and structural

information that is lost during bit-blasting. Furthermore, simple arithmetic functions such

as addition and multiplication can be very hard to reason about at the Boolean level. High

level decision engines such as Satisfiability Modulo Theories (SMT) solvers [14, 64]. can reason

about many theories such as linear real arithmetic, uninterpreted functions, arrays, lists and

bit vectors. For each theory the most effective solver is used until the entire problem is solved.

For example, for Boolean problems, a SAT solver is often employed. As SMT solvers gain

momentum in the research domain, their improved performance will generate interest from

other CAD for VLSI application domains. Debugging is an application that can greatly benefit


from improvements in SMT. Research dedicated to SMT debugging formulations or to the

development of specific theories for debugging promise to be a fruitful area.

In this thesis, automated debuggers exclusively analyze the design for error sources. This

analysis domain assumes that the verification environment composed of test pattern generators,

assertions, and testbench checkers are all correct (error free). In reality errors are as likely to

stem from the verification environment as they are to stem from the design. Although design

debugging techniques can also help identify bugs in the testbench by providing hints about

primary inputs or outputs that are identified as suspects, the vast majority of testbench error

sources cannot be localized. A promising research direction is that of debugging not only the

design, but the overall verification environment as well. As a simple case consider when an

assertion fails. In this case, there can be a combination of three general error locations (i) the

testbench module generating the stimulus patterns, (ii) the design, (iii) the assertion (checker).

Debugging techniques can focus on parts or combination of these locations. In practice, there

may exist parsing/integration challenges related to debugging the verification environment, as

some components may be written in procedural code (C/C++), while others are implemented

in behavioural or structural RTL (Verilog). In general, any debugging information reported

about the verification environment will provide much value to the verification engineer.

7.3 Closing Remarks

The verification and debugging efforts are eclipsing most other VLSI design task today. The

bottleneck due to debugging can be partially alleviated by the use of efficient, practical and

robust automated debugging techniques. Such techniques are still at their infancy as their adop-

tion by the industry is challenged by their ability to handle large problem instances. The work

presented in this dissertation represents some of the most powerful techniques developed to

date to help bridge the gap between current debugging capabilities and demands by industrial

applications. With techniques such as abstraction and refinement, bounded model debugging,

max-sat formulations, and trace reduction, some of the major obstacles are significantly re-

duced. Additionally, the contributions presented here show great promise about the future of


automated debugging. As outlined in this chapter there are many areas within debugging that

have the potential to provide significant improvements. With research and industry support,

automated debugging tools can become as common as today’s popular functional simulators to

verification engineers.

Bibliography

[1] M. Abramovici, M. Breuer, and A. Friedman, Digital Systems Testing and Testable De-

sign. Computer Science Press, 1990.

[2] M. Abramovici, P. R. Menon, and D. T. Miller, “Critical path tracing - an alternative to

fault simulation,” in DAC ’83: Proceedings of the 20th conference on Design automation,

1983, pp. 214–220.

[3] M. F. Ali, S. Safarpour, A. Veneris, M. Abadir, and R. Drechsler, “Post-verification

debugging of hierarchical designs,” in Int’l Conf. on CAD, 2005, pp. 871–876.

[4] J. Argelich, C. Li, F. Manya, and J. Planes, “Max-SAT 2008 - Third Max-SAT Evalua-

tion,” 2008, http://www.maxsat.udl.cat/08.

[5] L. Bening and H. Foster, Principles of Verifiable RTL Design. Kluwer Academic Pub-

lishers, 2001.

[6] J. Bergeron, Writing Testbenches: Functional Verification of HDL Models. Kluwer Aca-

demic Publishers, 2003.

[7] H. Bhatnagar, Advanced ASIC Chip Synthesis Using Synopsys Design Compiler Physical

Compiler and PrimeTime. Kluwer Academic Publishers, 2002.

[8] A. Biere, A. Cimatti, E. Clarke, O. Strichman, and Y. Zhu, “Bounded model checking,”

in Advances In Computers, 2003.

[9] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu, “Symbolic model checking without BDDs,”

126

BIBLIOGRAPHY 127

in Tools and Algorithms for the Construction and Analysis of Systems, ser. LNCS, vol.

1579. Springer Verlag, 1999, pp. 193–207.

[10] P. Bjesse and A. Boralv, “DAG-aware circuit compression for formal verification,” in Int’l

Conf. on CAD, 2004, pp. 42–49.

[11] P. Bjesse and J. Kukula, “Using counter example guided abstraction refinement to find

complex bugs,” in Design, Automation and Test in Europe, 2004, pp. 156–161.

[12] V. Boppana and W. K. Fuchs, “Dynamic faults collapsing and diagnostic test pattern

generation for sequential circuits,” in Int’l Conf. on CAD, 1998, pp. 147–154.

[13] R. Brayton, G. Hachtel, C. McMullen, and A. Sangiovanni-Vincentelli, Logic Minimiza-

tion Algorithms for VLSI Synthesis. Kluwer Academic Publishers, 1984.

[14] R. Bruttomesso, A. Cimatti, A. Franzen, A. Griggio, and R. Sebastiani, “The mathsat 4

SMT solver,” in Computer Aided Verification, 2008, pp. 299–303.

[15] J. Burch, E. Clarke, and D. Long, “Symbolic model checking with partitioned transition

relations,” in Int’l Conference on Very Large Scale Integration, 1991.

[16] J. Burch, E. Clarke, K. McMillan, and D. Dill, “Sequential circuit verification using

symbolic model checking,” in Design Automation Conf., 1990, pp. 46–51.

[17] K. Chang, V. Bertacco, and I. Markov, “Simulation-based bug trace minimization with

BMC-based refinement,” in Int’l Conf. on CAD, 2005, pp. 1045–1051.

[18] K.-H. Chang, I. Markov, and V. Bertacco, “Automating post-silicon debugging and re-

pair,” IEEE Trans. on Comp., p. to appear, 2008.

[19] P. Chauhan, E. M. Clarke, J. H. Kukula, S. Sapra, H. Veith, and D. Wang, “Automated

abstraction refinement for model checking large state spaces using sat based conflict anal-

ysis,” in Int’l Conf. on Formal Methods in CAD, 2002, pp. 33–51.

[20] Y. Chen and F. Chen, “Algorithms for compacting error traces,” in ASP Design Automa-

tion Conf., 2003, pp. 99–103.

BIBLIOGRAPHY 128

[21] E. Clarke, A. Biere, R. Raimi, and Y. Zhu, “Bounded model checking using satisfiability

solving,” Formal Methods in System Design: An International Journal, vol. 19, no. 1, pp.

7–34, 2001.

[22] E. Clarke, O. Grumberg, and D. Long, “Model checking and abstraction,” in Symposium

on Principles of Programming Languages, 1992, pp. 342–354.

[23] E. Clarke, A. Gupta, and O. Strichman, “SAT-based counterexample-guided abstraction

refinement,” IEEE Trans. on CAD, vol. 22, no. 7, pp. 1113–1123, 2004.

[24] E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith, “Counterexample-guided abstrac-

tion refinement for symbolic model checking,” Journal of the ACM, vol. 50, no. 5, pp.

752–794, 2003.

[25] S. Cook, “The complexity of theorem proving procedures,” in 3rd Annual ACM Sympo-

sium on Theory of Computing, 1971, pp. 151–158.

[26] T. Cormen, C. Leierson, and R. Rivest, Introduction to Algorithms. MIT Press, McGraw-

Hill Book Company, 1990.

[27] M. Davis, G. Logemann, and D. Loveland, “A machine program for theorem proving,”

Comm. of the ACM, vol. 5, pp. 394–397, 1962.

[28] M. Davis and H. Putnam, “A computing procedure for quantification theory,” Journal of

the ACM, vol. 7, pp. 506–521, 1960.

[29] N. Dershowitz, Z. Hanna, and J. Katz, “Bounded model checking with QBF,” in Int’l

Conf. on Theory and Applications of Satisfiability Testing, 2005, pp. 408–414.

[30] R. Drechsler and S. Horeth, “Gatecomp: Equivalence checking of digital circuits in an

industrial environment,” in Int’l Workshop on Boolean Problems, 2002, pp. 195–200.

[31] EETimes.com, “Faster Verification is the goal at ST,” 2007,

http://www.eetimes.com/news/design/showArticle.jhtml?

articleID=197700622&pgno=3.

BIBLIOGRAPHY 129

[32] ElectronicsWeekly.com, “Leakage and verification costs both continue to rise, says Ca-

dence,” 2008, http://www.electronicsweekly.com/Articles/2008/10/23/44769/

leakage-and-verification-costs-both-continue-to-rise-says-cadence.htm.

[33] M. Fahim Ali, A. Veneris, S. Safarpour, R. Drechsler, A. Smith, and M.S.Abadir, “De-

bugging sequential circuits using Boolean satisfiability,” in Int’l Conf. on CAD, 2004, pp.

204–209.

[34] F. Fallah, “Coverage directed validation of hardware models,” Ph.D. dissertation, MIT,

1999.

[35] G. Fey, S. Safarpour, A. Veneris, and R. Drechsler, “On the relation between simulation-

based and SAT-based diagnosis,” in Design, Automation and Test in Europe, 2006, pp.

1139–1144.

[36] H. Foster, A. Krolnik, and D. Lacey, Assertion-Based Design. Kluwer Academic Pub-

lishers, 2003.

[37] M. Gao, J. Jiang, Y. Jiang, Y. Li, S. Sinha, and R. Brayton, “MVSIS,” in Int’l Workshop

on Logic Synth., 2001.

[38] E. Goldberg, M. Prasad, and R. Brayton, “Using SAT for combinational equivalence

checking,” in Int’l Workshop on Logic Synth., 2000, pp. 185–191.

[39] ——, “Using SAT for combinational equivalence checking,” in Design, Automation and

Test in Europe, 2001, pp. 114–121.

[40] S. Graf and H. Saidi, “Construction of abstract state graphs with PVS,” in Computer

Aided Verification. Springer-Verlag, 1997, pp. 72–83.

[41] F. Heras, J. Larrosa, and A. Oliveras, “MiniMaxSat: A new weighted max-sat solver,” in

Int’l Conf. on Theory and Applications of Satisfiability Testing, 2007, pp. 41–55.

[42] E. A. Hirsch, “Sat local search algorithms: Worst-case study,” Journal of Automated

Reasoning, vol. 24, no. 1-2, pp. 127–143, 2000.

BIBLIOGRAPHY 130

[43] S. Huang and K. Cheng, Formal Equivalence Checking and Design Debugging. Kluwer

Academic Publisher, 1998.

[44] S.-Y. Huang and K.-T. Cheng, “Errortracer: Design error diagnosis based on fault simu-

lation techniques,” IEEE Trans. on CAD, vol. 18, no. 9, pp. 1341–1352, 1999.

[45] S.-Y. Huang, “A fading algorithm for sequential fault diagnosis,” in DFT ’04: Proceedings

of the Defect and Fault Tolerance in VLSI Systems, 19th IEEE International Symposium

on (DFT’04), 2004, pp. 139–147.

[46] L. Huisman, “Diagnosing arbitrary defects in logic designs using single location at a time

(SLAT),” IEEE Trans. on CAD, vol. 23, no. 1, pp. 91–101, 2004.

[47] Intel Corp., “The Evolution of a Revolution,” 2008,

http://download.intel.com/pressroom/kits/IntelProcessorHistory.pdf.

[48] International Techonology Roadmap for Semiconductors, “ITRS 2006 Update,” 2008,

http://www.itrs.net/Links/2006Update/2006UpdateFinal.htm.

[49] H. Jain, D. Kroening, N. Sharygina, and E. Clarke, “Word level predicate abstraction

and refinement for verifying rtl verilog,” in Design Automation Conf. ACM, 2005, pp.

445–450.

[50] N. Jha and S. Gupta, Testing of Digital Systems. Cambridge University Press, 2003.

[51] H.-J. Kang and I.-C. Park, “SAT-based unbounded symbolic model checking,” IEEE

Trans. on CAD, vol. 24, no. 2, pp. 129–140, 2005.

[52] B. Keng, H. Mangassarian, and A. Veneris, “A succinct memory model for automated

design debugging,” in Int’l Conf. on CAD, 2008, pp. 137–142.

[53] T. Kropf, Introduction to Formal Hardware Verification. Springer, 1999.

[54] A. Kuehlmann, V. Paruthi, F. Krohm, and M. Ganai, “Robust Boolean reasoning for

equivalence checking and functional property verification,” IEEE Trans. on CAD, vol. 21,

no. 12, pp. 1377–1394, 2002.

BIBLIOGRAPHY 131

[55] G. M. L. Lavagno, L. Scheffer, EDA for IC Implementation, Circuit Design, and Pro-

cess Technology (Electronic Design Automation for Integrated Circuits Handbook). CRC

Press, 2006.

[56] T. Larrabee, “Test pattern generation using Boolean satisfiability,” IEEE Trans. on CAD,

vol. 11, pp. 4–15, 1992.

[57] C. Lee, “Representation of switching circuits by binary decision diagrams,” Bell System

Technical Jour., vol. 38, pp. 985–999, 1959.

[58] T. Lee, W. Chuang, I. Hajj, and W. Fuchs, “Circuit-level dictionaries of CMOS bridging

faults,” in VLSI Test Symp., 1994, pp. 386–391.

[59] B. Li, M. Hsiao, and S. Sheng, “A novel SAT all-solutions solver for efficient preimage

computation,” in Design, Automation and Test in Europe, 2004, pp. 272–277.

[60] M. Liffiton and K. A. Sakallah, “On Finding All Minimally Unsatisfiable Subformulas,”

in Int’l Conf. on Theory and Applications of Satisfiability Testing, 2005, pp. 32–43.

[61] M. Liffiton and K. Sakallah, “Algorithms for computing minimal unsatisfiable subsets of

constraints,” Journal of Automated Reasoning, vol. 40, no. 1, pp. 1–33, 2008.

[62] J. Liu and A. Veneris, “Imcremental fault diagnosis,” IEEE Trans. on CAD, vol. 24,

no. 2, pp. 240–251, 2005.

[63] F. Lu, L.-C. Wang, K.-T. Cheng, and R. Huang, “A circuit SAT solver with signal

correlation guided learning,” in Design, Automation and Test in Europe, 2003, pp. 892–

897.

[64] C. Lynch and Y. Tang, “Interpolants for linear arithmetic in SMT,” in Automated Tech-

nology for Verification and Analysis, 2008, pp. 156–170.

[65] F. Y. Mang and P.-H. Ho, “Abstraction refinement by controllability and cooperativeness

analysis,” in Design Automation Conf. ACM, 2004, pp. 224–229.

BIBLIOGRAPHY 132

[66] H. Mangassarian, A. Veneris, S. Safarpour, M. Benedetti, and D. Smith, “A performance-

driven qbf-based iterative logic array representation with applications to verification, de-

bug and test,” in Int’l Conf. on CAD, 2007, pp. 240–245.

[67] H. Mangassarian, A. Veneris, S. Safarpour, F. N. Najm, and M. S. Abadir, “Maximum

circuit activity estimation using pseudo-boolean satisfiability,” in Design, Automation

and Test in Europe, 2007, pp. 1538–1543.

[68] J. Marques-Silva and J. Planes, “Algorithms for maximum satisfiability using unsatisfiable

cores,” in Design, Automation and Test in Europe, 2008, pp. 6–10.

[69] J. Marques-Silva and K. Sakallah, “GRASP – a new search algorithm for satisfiability,”

in Int’l Conf. on CAD, 1996, pp. 220–227.

[70] J. Marques-Silva and K. Sakallah, “GRASP: A search algorithm for propositional satisfi-

ability,” IEEE Trans. on Comp., vol. 48, no. 5, pp. 506–521, 1999.

[71] K. McMillan, “Applying SAT methods in unbounded symbolic model checking.” in Com-

puter Aided Verification, 2002, pp. 250–264.

[72] G. Moore, “Cramming more components onto integrated circuits,” electronics, vol. 38,

no. 8, pp. 1–4, 1965.

[73] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, “Chaff: Engineering an

efficient SAT solver,” in Design Automation Conf., 2001, pp. 530–535.

[74] N. S. N. Een, “An Extensible SAT-solver,” in Int’l Conf. on Theory and Applications of

Satisfiability Testing, 2003, pp. 333–336.

[75] G.-J. Nam, K. Sakallah, and R. Rutenbar, “A new fpga detailed routing approach via

search-based boolean satisfiability,” IEEE Trans. on CAD, vol. 21, no. 6, pp. 674–684,

2002.

[76] ——, “A new FPGA detailed routing approach via search-based Booleansatisfiability,”

IEEE Trans. on CAD, vol. 21, no. 6, pp. 674–684, 2002.

BIBLIOGRAPHY 133

[77] OpenCores.org, 2008, http://www.opencores.org.

[78] S.-J. Pan, K.-T. Cheng, J. Moondanos, and Z. Hanna, “Generation of shorter sequences

for high resolution error diagnosis using sequential sat,” in ASP Design Automation Conf.,

2006, pp. 25–29.

[79] D. Plaisted and S. Greenbaum, “A structure-preserving clause form translation,” J. Symb.

Comput., vol. 2, no. 3, pp. 293–304, 1986.

[80] P. Rashinkar, P. Paterson, and L. Singh, System-on-a-chip Verification: Methodology and

Techniques. Kluwer Academic Publisher, 2000.

[81] S. Safarpour and A. Veneris, “Automated design debugging with abstraction and refine-

ment,” IEEE Trans. on CAD, 2009, under review.

[82] S. Safarpour, A. Veneris, and R. Drechsler, “Integrating observability don’t cares in all-

solution SAT solvers,” in IEEE International Symposium on Circuits and Systems, 2006,

pp. 1587–1590.

[83] ——, “Improved SAT-based reachability analysis with observability don’t cares,” Journal

on Satisfiability, Boolean Modeling and Computation, vol. 5, pp. 1–25, 2008.

[84] S. Safarpour, A. Veneris, and H. Mangassarian, “Trace compaction using SAT-based

reachability analysis,” in ASP Design Automation Conf., 2007, pp. 932–937.

[85] S. Safarpour, M. H. Liffiton, H. Mangassarian, A. Veneris, and K. A. Sakallah, “Improved

design debugging using maximum satisfiability,” in Int’l Conf. on Formal Methods in

CAD, 2007, pp. 13–19.

[86] S. Safarpour and A. Veneris, “Abstraction and refinement techniques in automated design

debugging,” in Design, Automation and Test in Europe, 2007, pp. 1182–1187.

[87] S. Safarpour, A. Veneris, G. Baeckler, and R. Yuan, “Efficient SAT-based Boolean match-

ing for FPGA technology mapping,” in Design Automation Conf., 2006, pp. 466–471.

BIBLIOGRAPHY 134

[88] S. Sahni and A. Bhatt, “The complexity of design automation problems,” in Design

Automation Conf., 1980, pp. 402–411.

[89] E. Sentovich, K. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj,

P. Stephan, R. Brayton, and A. Sangiovanni-Vincentelli, “SIS: A system for sequential

circuit synthesis,” University of Berkeley, Tech. Rep., 1992.

[90] S. Shen, Y. Qin, and S. Li, “A faster counterexample minimization algorithm based on

refutation analysis,” in Design, Automation and Test in Europe, 2005, pp. 672–677.

[91] A. Smith, A. Veneris, M. F. Ali, and A. Viglas, “Fault diagnosis and logic debugging

using Boolean satisfiability,” IEEE Trans. on CAD, vol. 24, no. 10, pp. 1606–1621, 2005.

[92] A. Smith, A. Veneris, and A. Viglas, “Design diagnosis using Boolean satisfiability,” in

ASP Design Automation Conf., 2004, pp. 218–223.

[93] F. Somenzi, “Efficient manipulation of decision diagrams,” Software Tools for Technology

Transfer, vol. 3, no. 2, pp. 171–181, 2001.

[94] D. Stoffel and W. Kunz, “Record & play: a structural fixed point iteration for sequential

circuit verification,” in Int’l Conf. on CAD, 1997, pp. 394–399.

[95] O. Strichman, “Pruning techniques for the sat-based bounded model checking problem.”

in CHARME, 2001, pp. 58–70.

[96] A. Suelflow, G. Fey, R. Bloem, and R. Drechsler, “Using unsatisfiable cores to debug

multiple design errors,” in Great Lakes Symp. VLSI, 2008, pp. 77–82.

[97] G. S. Tseitin, “On the complexity of derivation in propositional calculus,” in Studies in

Constructive Mathematics and Mathematical Logic, Part II, 1968, pp. 115–125.

[98] A. Veneris and M. Abadir, “Design rewiring using ATPG,” IEEE Trans. on CAD, vol. 21,

no. 12, pp. 1469–1479, 2002.

[99] A. Veneris and I. N. Hajj, “Design error diagnosis and correction via test vector simula-

tion,” IEEE Trans. on CAD, vol. 18, no. 12, pp. 1803–1816, 1999.

BIBLIOGRAPHY 135

[100] F. Wotawa and M. Nica, “Record & play: a structural fixed point iteration for sequential

circuit verification,” in International Symposium on Intelligent and Distributed Comput-

ing, 2007, pp. 1–10.

[101] Y.-S. Yang, A. Veneris, and N. Nicolici, “Automated data analysis solutions to silicon

debug,” in Design, Automation and Test in Europe, 2009, to appear in April 2009.

[102] Q. Zhu, N. Kitchen, A. Kuehlmann, and A. Sangiovanni-Vincentelli, “Sat sweeping with

local observability don’t-cares,” in Design Automation Conf., 2006, pp. 229–234.

formal methods in automated design debugging › bitstream › 1807 › 17828 › ... ·...

Documents