formal methods in automated design debugging › bitstream › 1807 › 17828 › ... ·...
TRANSCRIPT
Formal Methods in Automated Design Debugging
by
Sean A. Safarpour
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright c© 2009 by Sean A. Safarpour
Abstract
Formal Methods in Automated Design Debugging
Sean A. Safarpour
Doctor of Philosophy
Graduate Department of Electrical and Computer Engineering
University of Toronto
2009
The relentless growth in size and complexity of semiconductor devices over the last decades
continues to present new challenges to the electronic design community. Today, functional
debugging is a bottleneck that jeopardizes the future growth of the industry as it can account
for up to 30% of the overall design effort. To alleviate the manual debugging burden for
industrial problems, scalable, practical and robust automated debugging solutions are required.
This dissertation presents novel techniques and methodologies to bridge the gap between
current capabilities of automated debuggers and the strict industry requirements. The contri-
butions proposed leverage powerful advancements made in the formal method community, such
as model checking and reasoning engines, to significantly ease the debugging effort.
The first contribution, abstraction and refinement, is a systematic methodology that reduces
the complexity of debugging problems by abstracting irrelevant sections of the circuits under
analysis. Powerful abstraction techniques are developed for netlists as well as hierarchical and
modular designs. Experiments demonstrate that an abstraction and refinement methodology
requires up to 200 times less run-time and 27 times less memory than a state-of-the-art debugger.
The second contribution, Bounded Model Debugging (BMD), is a debugging methodology
based on the observation that erroneous behaviour is more likely caused by errors excited
temporally close to observation points. BMD systematically generates a series of consecutively
larger yet more complete debugging problems to be solved. Experiments show the effectiveness
ii
of BMD as 93% of the large problems are solved with BMD versus 34% without BMD.
A third contribution is an automated debugging formulation based on maximum satisfia-
bility. The formulation is used to build a powerful two step, coarse and fine grained debugging
framework providing up to 980 times performance improvements.
The final contribution of this thesis is a trace reduction technique that uses reachability
analysis to identify the observed failure with fewer simulation events. Experiments demonstrate
that many redundant state transitions can be removed resulting in traces with up to 100 times
fewer events than the original.
iii
Acknowledgements
I wish to express my sincere gratitude to my Ph.D. supervisor, Professor Andreas Veneris
for being a mentor, a colleague and a friend. Over the course of my research at the University
of Toronto, Professor Veneris has been a steady source of passion, motivation, and guidance.
Many thanks to my Ph.D. committee members Professors Farid Najm, Andreas Moshovos,
Masahiro Fujita, Jason Anderson, Frank Kschischang and Charles Mims for their thorough
reviews of my dissertation and their insightful suggestions.
I am also indebted to the Vennsa family and my colleagues at the University of Toronto for
our fruitful discussions and their technical support of my research. Special thanks to Duncan
Smith, Terry Yang, Hratch Mangassarian, Brian Keng, Yibin Chen, Patrick Halina, Evean Qin,
Alan Baker and Andrew Ling.
I would like to thank my parents for being my foundation, role models, and an endless
source of love and support. Many thanks to my sister, grandmother and the rest of my dear
family for their advice and encouragement through my academic endeavours. I wish to express
my appreciation to Sarah Osmanski for her encouragement and selfless support of my goals and
ambitions.
I would like to express my gratitude to the Natural Sciences and Engineering Research
Council of Canada (NSERC) for their financial support of my research.
iv
Contents
List of Tables ix
List of Figures x
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Design Verification and Design Debug . . . . . . . . . . . . . . . . . . . . 4
1.2 Automated Design Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Abstraction and Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Bounded Model Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Max-sat based Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 Trace reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Background 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Verification Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Automated Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Complexity of Automated Debugging . . . . . . . . . . . . . . . . . . . . 18
2.4 Simulation- and BDD-based Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 SAT-Based Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Boolean Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
v
2.6.1 CNF Representation for Boolean SAT . . . . . . . . . . . . . . . . . . . . 24
2.6.2 Boolean Satisfiability Algorithms . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.3 Maximum Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Debugging using Abstraction and Refinement 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Abstraction and Refinement in Model Checking . . . . . . . . . . . . . . . 32
3.3 Debugging with Abstraction and Refinement . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Guaranteeing Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Spurious Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Guaranteeing Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.4 Overall Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 State Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.1 Trace Length Reduction Benefits . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Function Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.1 Hierarchical abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.2 Overall Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6.1 State Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6.2 Function Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Bounded Model Debugging 55
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Bounded Model Debugging Formulation . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 Probability Analysis of Error Behaviour . . . . . . . . . . . . . . . . . . . 58
4.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3.3 Impact on Error Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . 69
vi
4.3.4 Improvements to Basic Methodology . . . . . . . . . . . . . . . . . . . . . 72
4.3.5 Overall Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 Debugging using Max-SAT 82
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2.1 Maximum Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Debugging Combinational Circuits with Max-sat . . . . . . . . . . . . . . . . . . 85
5.3.1 Error Clause Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3.2 Error Group Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Extension to Sequential Circuits and Multiple Vectors . . . . . . . . . . . . . . . 89
5.5 Debugging with Approximate Max-sat . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.1 Efficient Max-sat Framework . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6 Trace Reduction 100
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.1 Finite State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.2 Image and Pre-image Computation . . . . . . . . . . . . . . . . . . . . . . 103
6.2.3 Reachability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3 Proposed Trace Compaction Approach . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.1 Reachability Based Trace Compaction . . . . . . . . . . . . . . . . . . . . 105
6.3.2 Creating More Short-cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.3 State Selection Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.4 All-Solution SAT Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.4 Storing Visited States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
vii
6.4.1 Determining State Containment Relationships . . . . . . . . . . . . . . . 110
6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7 Conclusion and Future Work 118
7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2.1 Extension of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.2.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Bibliography 126
viii
List of Tables
2.1 Simple gates and their CNF representation . . . . . . . . . . . . . . . . . . . . . 25
3.1 Statistics for problems and the stand-alone SAT-based debugging approach . . . 45
3.2 Performance statistics for abstraction and refinement debugging framework . . . 46
3.3 Summary of b14 when abstracting over 80% of flip-flops . . . . . . . . . . . . . . 49
3.4 Summary of ac97 when abstracting over 96% of flip-flops . . . . . . . . . . . . . . 49
3.5 Summary of problems for function abstraction . . . . . . . . . . . . . . . . . . . . 50
3.6 Results of proposed function abstraction and refinement technique . . . . . . . . 50
4.1 Likelihood of errors on gates of Figure 4.1 being excited . . . . . . . . . . . . . . 60
4.2 Circuit and performance statistics without BMD . . . . . . . . . . . . . . . . . . 76
4.3 Performance with BMD on increment size of 10 clock cycles . . . . . . . . . . . 77
5.1 Max-sat+debug versus stand-alone debugger . . . . . . . . . . . . . . . . . . . . 96
6.1 Results of proposed trace length compaction for traces of length 50, 100, 1000. . 115
6.2 Summary of the results for the proposed trace length compaction approach . . . 116
ix
List of Figures
1.1 Simplified VLSI design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Flow chart of typical manual debugging process. . . . . . . . . . . . . . . . . . . 5
2.1 A circuit with failure observed at O1. . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Debugging in a modern simulation-based verification flow . . . . . . . . . . . . . 17
2.3 Circuit before and after adding correction models . . . . . . . . . . . . . . . . . . 22
2.4 Example: circuit and CNF representation . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Basic DPLL SAT solving algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Circuit before and after abstracting flip-flop q1 . . . . . . . . . . . . . . . . . . . 34
3.2 Demonstrating the effect of unconstrained inputs on abstract circuit . . . . . . . 35
3.3 Abstract circuit unfolded over two time frames . . . . . . . . . . . . . . . . . . . 37
3.4 Debugging algorithm with state abstraction and refinement . . . . . . . . . . . . 39
3.5 Reduced trace V ′ due to abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6 Function F 11 is composed of functions F 2
2 and F 23 . . . . . . . . . . . . . . . . . . 42
3.7 Hierarchical abstraction and refinement example . . . . . . . . . . . . . . . . . . 43
3.8 Hierarchical debugging algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.9 Logic and trace reduction vs. flip-flops abstracted . . . . . . . . . . . . . . . . . . 45
3.10 Solve time and # literals vs. # of refinement and debugging iterations . . . . . . 52
4.1 Sample pipeline circuit with single output . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Five time frame ILA for circuit in Figure 4.1 . . . . . . . . . . . . . . . . . . . . 59
4.3 Illustration of example where error is excited in cycle 1 and observed in cycle d . 63
x
4.4 Plotting pd as function of d with prop = obs = {0.1, 0.5, 0.9} . . . . . . . . . . . . 63
4.5 Plotting pd as function of prop and obs . . . . . . . . . . . . . . . . . . . . . . . . 64
4.6 ILA for length three with error excited on gate A . . . . . . . . . . . . . . . . . . 66
4.7 Suffix of size two of ILA shown in Figure 4.6 . . . . . . . . . . . . . . . . . . . . 67
4.8 ILA of Figure 4.7 annotated with error suspects and initial state suspects . . . . 69
4.9 Simple circuit with single error source on gate A . . . . . . . . . . . . . . . . . . . 69
4.10 Example of single error source excited in two clock cycles . . . . . . . . . . . . . 70
4.11 Example of pipelined circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.12 Three time frame ILA of circuit in Figure 4.11 . . . . . . . . . . . . . . . . . . . 71
4.13 Example with error source A propagating through three DFFs . . . . . . . . . . . 72
4.14 Complete BMD algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.15 Memory usage versus CPU run-time for four selected problems . . . . . . . . . . 79
4.16 Debugger solutions found versus BMD iterations . . . . . . . . . . . . . . . . . . 80
5.1 Correct and erroneous circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Erroneous sequential circuit and its ILA representation . . . . . . . . . . . . . . . 90
5.3 Error masking in clause groupings . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4 Max-sat debugging framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5 Run-time versus clause grouping size . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6 Number of solved instances for max-sat20+debug and debug . . . . . . . . . . . 97
5.7 Run-time comparison for max-sat20+debug and debug . . . . . . . . . . . . . . . 98
6.1 Finite State Machine with 7 states . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2 A sample trace for the above FSM . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3 Illustration of reachability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.4 Updating the graph G with new nodes and edges . . . . . . . . . . . . . . . . . . 106
6.5 Illustrating rules 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.6 Trace compaction procedure using reachability analysis . . . . . . . . . . . . . . . 109
6.7 Illustrating state storage data structure . . . . . . . . . . . . . . . . . . . . . . . 111
6.8 Finding supersets and subsets in the tree T . . . . . . . . . . . . . . . . . . . . . 112
xi
6.9 Determine the states that are supersets of this state . . . . . . . . . . . . . . . . 113
6.10 Comparison of state selection methods . . . . . . . . . . . . . . . . . . . . . . . . 114
xii
Chapter 1
Introduction
1.1 Motivation
The relentless consumer demand for devices with complex functionality and superior perfor-
mance continues to drive the fast growth of the semiconductor industry. In the past 20 years,
this industry has continuously generated electronic devices with more functionalities, superior
performance, smaller physical size, and reduced power consumption. For instance, as fore-
casted by Moore’s Law [72], in 1979 Intel’s 8088 microprocessor had 29,000 transistors while
its counterpart in 2007, Intel’s Quad-core Xeon processor has 820,000,000 transistors [47]. This
exponential efficiency growth over the past decades has stretched the performance limits for
certain aspects of the design flow for Very Large Scale Integration (VLSI) systems.
In particular, during the last 10 years, there has been an significant increase in the cost
and time required for verification and debugging of those systems. This is partially because
of the complex nature of the modern semiconductor design flow. The processes of design
and verification are comprised of heterogeneous components implemented at multiple levels
of abstractions (procedural, behavioural, synthesizable, etc.) using different languages and
various standards. Interacting with such components adds several layers of complexity. The
lack of a unified and centralized verification environment makes verification and debugging
more challenging. Further, design specifications often described in abstract models may not
directly correspond to signals and transactions at the design level. The separation between
1
Chapter 1. Introduction 2
(text, C/C++, matlab)
Register Transfer Level
Verilog/VHDL
Specification
High−level vs. RTL
Verification
Verification
RTL vs. netlist
Implement ortranslate
Synthesis
Place and route
Gatelevel netlist
Placed and routed netlist
GDSII
Figure 1.1: Simplified VLSI design flow
these two layers can result in misinterpretations and usually complicates the verification efforts.
Additionally, the size of modern devices poses additional performance constraints to existing
automated design tools. Task outsourcing to geographical dispersed design and verification
teams only adds additional layers of communication overhead to verification and debugging.
In contemporary VLSI design flows, many interdependent steps must be orchestrated and
fine-tuned for designers to develop an Integrated Circuit (IC) successfully. If any of these
steps introduce errors or time delays, the success of the entire product, or even future sales
projections of the company, may be in jeopardy. Figure 1.1 illustrates the major steps of
the IC design process. A product inception typically starts with a specification outlining the
Chapter 1. Introduction 3
IC behaviour at the system level. The specification can be described in a plain document,
as a Matlab model or it can be specified in a software language such as C/C++. Next, the
specification is implemented by designers at the Register Transfer Level (RTL) in languages such
as Verilog or VHDL. Following RTL implementation, verification ensures that the specification
matches the generated RTL. When an adequately high level of confidence is reached about the
correctness of the RTL, circuit synthesis generates a gate-level netlist that models the design
using components in the target fabrication technology. Next, place and route tools model the
circuit at the physical layout level using transistors and wires. Finally the design is provided
to an IC fabrication plant in a format such as GDSII so that a prototype can be manufactured.
The prototype later undergoes test, a process that may reveal functional errors that propagate
to the silicon or it may show the presence of fabrication defects. If the chip prototype is found
to be faulty, different steps of the design cycle in Figure 1.1 are repeated, until the chip passes
test and it is released for mass production.
In the history of Electronic Design Automation (EDA) industry, most of the manual steps
in the VLSI design flow have been fully or partially automated. Most notably, design synthesis,
place and route and automated test pattern generation tools populate the fast growing EDA
marketplace [7, 55]. A critical efficiency challenge in the VLSI design cycle that can benefit from
more automation is functional verification and debug. It is estimated that these two processes
contribute to as much as 70% of the effort of architecting and designing a new semiconductor
device [5, 6, 32, 80]. Both are emerging problems with significant impact on the modern design
cycle.
In 2006, the well respected International Technology Roadmap for Semiconductors (ITRS),
issued its new set of needs for the current and next generation design processes for semicon-
ductors. Although most topics saw minor numeric revisions, the roadmap contains a major,
fourteen-page update in design verification with a strong emphasis in debugging. The report
states that “technological progress depends on the development of rigorous and efficient methods
to achieve high-quality verification results ... and techniques to ease the burden of debugging a
design once a bug is found”. It continues, “Verification must increasingly integrate more formal
and semi-formal engines to complement simulation... Without major breakthroughs, verification
Chapter 1. Introduction 4
will be a non-scalable, show-stopping barrier to further semiconductor progress” [48]. Without
a doubt, the roadmap depicts a grim yet realistic picture that establishes an urgent need for
scalable automated verification and debugging techniques. Further evidence is also provided
by the fact that the ratio of verification engineers to design engineers employed in a typical
microprocessor team has quadrupled over the last decade [48]. Clearly, this trend must be
controlled, and those processes need be automated, for the semiconductor industry to maintain
its historic growth rate for the past 40 years.
1.1.1 Design Verification and Design Debug
Verification is the process that ensures a design behaves correctly and according to its speci-
fication. Motivated by the verification challenge outlined above, the academic and industrial
communities have introduced countless automated tools, methodologies and analysis engines
to reduce the bottleneck. Major advancements in this field include equivalence checkers [38],
property checkers [16], functional coverage analysis tools [34], automated testbench genera-
tion [1], Assertion Based Verification (ABV) [36] and the use of Binary Decision Diagrams
(BDDs) [57, 93] and Boolean Satisfiability (SAT) solvers [27, 74]. These techniques and many
others allow for the efficient exploration of the design space to identify errors, exercise opera-
tional corner cases and assess the quality of the verification process itself. Whereas verification
engineers only had simulation tools to aide them a decade ago, today there exist specialized
formal verification and functional error coverage tools to help them prove correctness of a de-
sign or identify the presence of errors. With the advent of these new techniques and tools, the
exponential growth of the verification effort can be lessened to a certain extent.
Once a functional discrepancy is found between a design and its specification during verifi-
cation, debugging attempts to localize the error source or bug. This localization process is done
with reference to RTL files, gate-level netlists, schematic views of the design, or even tempo-
rally within a simulation trace. Although many verification processes have been automated,
once they detect the existence of errors, the task of RTL debugging remains a pure and tedious
manual procedure. It is a time-consuming and resource-intensive task that takes days, or even
weeks, to converge, it increases the non-recurring costs and it may jeopardize the release date
Chapter 1. Introduction 5
Bug found? Bug found?
Verification engineers
attempts to find error cause
Failure discovered
by verification process
...
No
Yes
Designer 1 looks for bug Designer K looks for bug
Yes
No
Fix bugFix bug
Give to next designerGive to next designer
Figure 1.2: Flow chart of typical manual debugging process.
of a chip. A typical debugging process comprises of arduous tasks such as collecting informa-
tion from the verification tools, manually back-tracing signals using Graphical User Interface
(GUI) waveform and source code (RTL or netlist) viewers, and performing “what-if” analysis
on potential error suspects. Once a candidate error location is found, a designer or engineer can
rectify the design and remove the bug. In virtually all cases, to ensure that the fix is adequate
and not limited to a specific test case, the verification process is usually repeated.
Part of the debugging pain today stems from the fact that designers are responsible for
blocks as large as half a million synthesized gates for systems that are comprised by dozens
of such blocks integrated together. In the days of globalization, sometimes design groups may
not even be within a geographical proximity as they build and integrate these pieces together.
The result is many functional errors due to miscommunication between designers, incorrect
interpretation of the specification, incomplete specification or due to the human error factor.
Chapter 1. Introduction 6
Design bugs are incredibly hard to diagnose. They require a detailed knowledge of blocks and
a broader system level understanding, a wide body of knowledge which is virtually not available
to a single designer/verification engineer. Consequently, it is common for a verification failure
to be passed from designer to designer until the exact error source of the bug is discovered.
A typical contemporary debugging process is depicted in Figure 1.2. This figure presents
a flow chart of how debugging is performed at the system level between verification engineers
and designers. Notice that for complicated bugs it is possible for the debugging process to
consume many designers and verification engineers before the appropriate fix is made. Taking
into account that designers are under strict time constraints, each bug not only delays the
design for hours or days at a time, but it also pushes the release date of the final product, a
fact with important financial and marketing consequences.
Over the last decade, VLSI design companies have overcome the debugging obstacle by
allocating more verification engineers to the problem. Today, there are two to three times more
verification engineers than designers in most companies [6]. It is clear that adding verification
engineers to the problem is not a scalable solution in the long term as the debugging pain
continues to clime. Automated debugging techniques are urgently required to alleviate the
manual debugging problem and drastically improve the design efficacy [31, 48].
1.2 Automated Design Debugging
The discrepancies described in the previous section between design demands, cost projections
and EDA capabilities in verification and debugging, introduce uncertainties, escalate costs
and delay product delivery. It is clear that at the heart of verification, there is a highly
manual debugging task [31]. Current industrial debugging methodologies use non-automated
solutions that package simulation results, waveform and source code viewers/editors under
one roof. These tools involve time-consuming human-driven manual processes that introduce
uncertainties in design closure.
The research community has dedicated significant efforts in automated fault (defect) diag-
nosis techniques in the last three decades to diagnose silicon prototypes that fail test. These
Chapter 1. Introduction 7
techniques rely on algorithms such as simulation, path-tracing, and Binary Decision Dia-
grams [1, 2, 45] and have been successful at localizing physical defects introduced by the
fabrication process. On the other hand, their success at debugging functional RTL errors
has been limited. At first sight, fault diagnosis and functional RTL design debugging can be
viewed as complementary problems with similar goals at different stages of the design cycle.
However, after a closer look, there are considerable differences that impact the performance of
algorithms addressing these problems.
First, fault diagnosis problems have a large set of observation and controllability points due
to embedded test hardware mechanisms such as scan chains [1] and trace buffers [101]. Scan
chains allow the test engineer to inject particular vectors at a specific area in the design and
observe their exact behaviour at a fixed clock cycle(s). This is a destructive process where the
test needs to be re-run. Similarly, trace buffers can observe or inject a series of line values
during regular chip operation without disturbing normal operation. These abilities may not
exist in design debugging where the golden model and the design may have completely different
representations. For example, the specification can be a Matlab program while the design is a
Verilog or VHDL representation.
Next, scan chains can sometimes transform the sequential diagnosis problem to a purely
combinational one where faults can be identified directly in the combinational logic. In contrast,
RTL debugging must diagnose sequential errors observed many clock cycles after the error is
excited. Finally, defect diagnosis makes use of fault models that can predict the functional
behaviour of various manufacturing defects. No corresponding models exist for functional RTL
errors, a fact that complicates further the debugging process.
Recently, a new genre of automated debugging methodologies based on formal engines such
as Boolean Satisfiability (SAT) solvers, Quantified Boolean Formula (QBF) solvers and maxi-
mum SAT (max-sat) solvers have been shown to outperform conventional techniques in design
and silicon debug [18, 85, 91]. SAT engines have been used to drive many other CAD tools
for VLSI such as in equivalence checkers [39], model checkers [8], and automated test pattern
generators [56]. Their success sprung interest in other EDA areas such as logic synthesis [102],
place and route [76], technology mapping [87], and fault diagnosis [35, 92]. By formulating
Chapter 1. Introduction 8
the RTL/fault diagnosis problem in terms of a SAT problem, great performance and capacity
improvements have been achieved [91]. It has been shown, that other implementations using
QBF and max-sat solvers can sometime further improve performance.
Another notable recent development in automated debugging is the use of the design hierar-
chy to incrementally diagnose the blocks or modules of a design [3]. Results from this work are
reported on real-life industrial designs to demonstrate the method’s effectiveness. Despite these
significant contributions, automated debugging technology continues to faces the challenges of
today’s the increasing design size and overall problem complexity.
Broadly speaking, there are three factors that limit the effectiveness of automated debugging
once verification fails. One factor is the design size that directly impacts the solution space. As
the gate count of a design simply increases, the number of potential error locations to consider
also increases exponentially to the number of errors [18, 99]. Another factor is the underlying
state size of the sequential design. This impacts temporally and spatially how errors can be
excited and observed. A last factor is the length of the simulation trace or counter-example
under analysis. For instance, traces in the industry today range from a few hundred cycles for
formal techniques, to tens of thousand of cycles for simulation-based verification. For automated
debugging solutions to be practical in the industry, all these aspects of the problem complexity
must be efficiently addressed.
1.3 Contribution
This thesis attempts to bridge the gap between current automating debugging capabilities and
contemporary industrial needs. It does so by introducing four novel techniques aimed to address
the three limiting factors discussed earlier: design size, sequential complexity, and length of
counter-example. By managing these three factors, experiments in this thesis show that much
larger designs with much longer traces can be handled by conventional automated debuggers.
As such, those results address important practical problems and generate new ground for further
research in the field.
Chapter 1. Introduction 9
In summary, this work makes the following contributions.
• It introduces the concept of abstraction and refinement in debugging. Abstraction and
refinement has been of great help to modern verification methodologies. In effect, it
reduces the size of the problem, in both gate count and sequential complexity. This thesis
tailors the solution to debugging and confirms similar performance gains.
• It presents a complete bounded model debugging methodology that incrementally solves
the problem with less computational resources. It does so by considering a window for
debugging starting from the cycle where a failure is first observed. The method stems from
similar concepts in formal verification, but the theory and implementation for debugging
is very different.
• It builds a novel debugging formulation based on maximum satisfiability. The method
is found to be effective for over-approximating of the debugging problem. As such, a
powerful two-step, coarse and fine grained debugging framework is developed.
• It presents a trace length reduction technique that identifies redundant events in simu-
lation traces that can be removed. The resulting shorter trace can reduce the memory
requirements for debugging tools dramatically.
We now discuss each of these contributions and their impact in debugging in detail.
1.3.1 Abstraction and Refinement
Abstraction and refinement reduces the debug problem by identifying and removing sections
of the design that are irrelevant to the errors in the design. The first step, namely abstrac-
tion, simplifies the design by removing components such as state elements or modules. Once
abstracted, automated debugging can operate on a much simpler and smaller version of the er-
roneous design. If the bug is in the abstracted elements, the method identifies this discrepancy
and re-introduces them during a refinement step. Following this, consecutive steps of debug
and refinement are performed iteratively until all error locations are found. Theory presented
in this thesis guarantees the correctness and completeness of the method.
Chapter 1. Introduction 10
Two types of abstraction techniques are discussed here. State abstraction focuses on ab-
stracting state elements leading to a smaller state space and possibly a shorter trace. Function
abstraction is a powerful technique that simplifies the design and its functionality. Function
abstraction can be applied iteratively at different levels of the design hierarchy providing added
benefits. State abstraction is ideal for design with little or no high level information while
function abstraction is a powerful technique for high level or RTL designs.
Although in the worst case abstraction and refinement can result in re-introducing the
entire design and thus debugging the original problem, in practice, this seldom happens. Just
like in verification, the technique presents a powerful solution that usually reduces the problem
complexity to just a small fraction of the original. Empirical data demonstrate that abstraction
and refinement effectively allow a debugger to tackle larger industrial designs than previously
possible.
1.3.2 Bounded Model Debugging
Bounded Model Debugging (BMD) is based on the observation that the erroneous behaviour of
the design is more likely caused by errors excited temporally close to observation points such as
primary outputs. For instance, when a stimulus combination excites an error, it is more likely
that the failure is observed immediately, within the next few clock cycles, rather than later.
This intuitive observation is shown to hold both statistically and empirically as a robust BMD
framework is developed to provide both significant run-time and memory improvements.
The BMD framework starts by formulating a small debugging problem based on the last
clock cycles of an error trace, called the suffix. These clock cycles are important because it is
highly likely that the error is excited at these times. By solving the smaller debugging problem,
many potential error sources can be found very quickly. In the event that the error is excited in
clock cycles prior to the ones under consideration, BMD will detect this scenario and formulate
a subsequent problem with a longer suffix. Subsequently, debugging is performed and the suffix
length is increased until all the error sources are identified. Theory presented in this thesis
guarantees the correctness and completeness of BMD.
Chapter 1. Introduction 11
The BMD framework empirically confirms the theory and statistical analysis developed in
this dissertation. More specially, experiments show that over 1/3 of debugging problems are
completely solved with suffixes of 10 clock cycles or less. As a result, run-time performance
improvements of up to two orders of magnitude are realized under BMD. Furthermore, while a
state-of-the-art debugger finds the error source in only 34% of problem instances, BMD manages
to find the errors in over 93% of the problems.
1.3.3 Max-sat based Debugging
This thesis presents the first maximum satisfiability (max-sat) formulation for design debug-
ging. The key idea is that since the incorrect design cannot produce the correct response, the
corresponding satisfiability problem is inherently unsatisfiable and it can only be satisfied if
some of the constraints are removed. A max-sat solver identifies which parts of the constraints
to remove. This subset of the original problem corresponds directly to the erroneous part of the
design which may contain the bug(s). The proposed max-sat formulation is shown to be both
functionally correct and complete and thus a viable alternative to SAT-based techniques [3].
The proposed max-sat formulation can also be used to over-approximate solutions by loosen-
ing the problem constraints. The over-approximation allows for a trade-off between the max-sat
solver’s run-time and the resolution of the solutions. More specifically, approximation can re-
duce the problem complexity and thus require less run-time at the cost of finding less precise
solutions. Although not exact, this approach can be employed as a pre-processing step that
filters solutions for another complete debugger. This debugger will have fewer error sources to
consider and it will return faster. Experiments show, this combined two-step coarse and fine
grained debugging framework results in more efficient automated debugging solutions.
1.3.4 Trace reduction
The simulation trace contains crucial information about the debugging problem as it provides
the stimulus and expected behaviour of the design. Since most automated debugging techniques
model the problem for every clock cycle of an automated debugger, a large simulation trace
can have an adverse effect on the memory requirements and performance of an automated
Chapter 1. Introduction 12
debugger. Unfortunately, the trace length is often a factor hard to control by the engineer or by
the respective CAD tools, as traces are automatically provided from testbenches or verification
processes. The goal in trace reduction is to generate a new trace from the original one that
exposes the bug under similar conditions but within a fewer number of clock cycles.
The trace reduction algorithm developed in this thesis uses reachability analysis techniques
based on circuit observability don’t cares [82, 83]. It identifies which states and input conditions
are necessary to excite and observe errors. Once these conditions are identified, redundant
events and specific state transitions can be eliminated. Efficient algorithms and data structures
are proposed that allow quick identification of “short-cuts” in the given trace. In practice,
where simulation traces are often generated from random or constrained random testbenches,
it is possible to reduce many redundant state transitions and find a shorter path from an initial
state, to the error excitation state, and to the error observation state. As a result, a simulation
trace may be reduced to a fraction of its original size thus making the debugging problem much
smaller and easier for existing automated debuggers to handle.
1.4 Thesis Outline
This thesis is structured as follows. In Chapter 2 background is provided on automated debug-
ging techniques, max-sat algorithms, and formal verification methods. Chapter 3 presents the
abstraction and refinement based debugging methodology. Chapter 4 introduces the bounded
model debugging technique. Chapters 5 and 6 describe the max-sat debugging formulation
and the trace reduction technique, respectively. Chapter 7 discusses future work in the area of
automated debugging and it also concludes this thesis.
Chapter 2
Background
2.1 Introduction
This chapter introduces background material and concepts pertaining to the contributions of
this dissertation. In Section 2.2, verification techniques are discussed as the preceding step to
design debugging. The central theme of this thesis, automated design debugging, is formally pre-
sented in Section 2.3 and the necessary introductory concepts are visited. Section 2.4 provides
background on traditional diagnosis techniques based on simulation and BDDs. Section 2.5
discusses the fundamentals of SAT-based debugging, a powerful debugging technique used and
enhanced multiple times in this thesis. Finally, the essential theoretical and implementation
components for SAT solvers and max-sat solvers are introduced in Section 2.6.
2.2 Verification Techniques
Functional verification aims to find failures in designs or prove their functional correctness.
When failures are identified, verification tools provide error traces or counter-examples to re-
produce the conditions for the failure. This dissertation is concerned with debugging functional
failures of digital circuits at the RTL and at the gate-level. Timing errors and other specification
violations such as power consumption failures are not examined here.
Definition 1 A failure is the manifestation of at least one discrepancy at an observation point
between a design implementation and its specification model when they both operated under the
same primary input stimulus.
13
Chapter 2. Background 14
A discrepancy at an observation point is a conflicting Boolean assignment at that point
during simulation under the same input stimulus. In this context, an observation point can
be a primary output or an internal design signal with a defined expected functional behaviour.
For example, a failure occurs when a circuit produces the Boolean value 0 at a primary output
under a given input stimulus while its specification dictates the value 1 for the same signal. The
root cause (source) of the failure is commonly referred to as an error or a bug. The moment
an error source produces a value that differs from its “golden” model, the error is said to be
excited. Debugging procedures use failures in order to localize the root cause by identifying the
excitation points. Once localized, engineers can rectify the design by removing the observed
failure.
Two main types of verification processes that detect failures exist at the RTL and at the
gate-level, simulation-based and formal-based techniques. Broadly speaking, simulation-based
techniques explore the design space by providing stimulus to the design through a testbench.
They model the behaviour of the design via a simulator or a hardware emulator and they
determine the existence of failures using appropriate testbench correctness checkers or monitors.
Simulation based verification remains the predominant verification methodology in the industry
today as it is used in more than 90% of verification cases [48]. The success of simulation-based
verification is primarily due to its ease of use and its run-time performance. It is much faster
and it can handle much larger designs when compared to formal verification. Theoretically,
simulation-based techniques cannot prove or disprove the correctness of a design unless 2(I+S)
unique stimulus vectors are exercised where I is the number of primary design inputs and S
is the number of its state elements. Since simulating 2(I+S) vectors is often not practical, it is
well accepted that a simulation-based technique can only provide a high-degree of confidence
about the functional correctness of a design and not an absolute certitude.
Formal verification techniques prove or disprove the correctness of a design by contrasting its
functional behaviour to that of a golden model or property. Formal verification tools explore the
design space exhaustively by employing formal engines such as BDDs and SAT solvers [27, 57,
74, 93]. Due to the inherent limitations of formal engines, formal verification tools are restricted
in their deployment by the size of the circuit under verification. In practice, these methods
Chapter 2. Background 15
operate on small modules with 500 thousand or less gates as anything larger exceeds their
application domain. Only when structural circuit information is available (like in equivalence
checking) do these techniques scale to larger designs. As a result, formal verification techniques
are typically engaged only at the block or at the sub-system level.
Once a bug is detected, both simulation-based and formal verification techniques provide a
simulation trace or counter-example that can be used as stimulus to reproduce the failure as
defined in Definition 1. Both approaches also utilize a golden or reference model that determines
the expected or correct value when a failure occurs. An important fact in all verification
methodologies except equivalence checking [38, 63] is that the reference or golden model acts as
a “black box”. That is, this model can only be simulated to provide the correct output values
and no intermediate structural correspondence between the circuit and the reference model
exists apart from the observation points. As we shall see, this fact complicates both verification
and the subsequent debugging efforts.
As an example, consider the erroneous circuit shown in Figure 2.1 and the simulation input
trace {I1, I2, I3, I4} = {0, 1, 0, 0}. We assume that a golden model provides the expected (refer-
ence) value of 1 at observation point O1. In this case, there is a failure since the correct value 1
conflicts with the simulated value of 0. This discrepancy is depicted as pair of correct/erroneous
logic values of 1/0 in the figure. This information from simulation is captured by a verification
checker and it is used later in automated debugging.
01
0
0
A0
C
B
0
0
D
O1
I1
I2
I3
I4
1/0
Figure 2.1: A circuit with failure observed at O1.
Definition 2 A diagnosis vector v is a union of three sets. One set contains the sequence of
primary input logic values needed to simulate the failure. The second set contains the initial
state values for all state elements. The third set contains the sequence of correct (reference)
values for the observation signals.
Chapter 2. Background 16
The three sets in v are also called input vector, initial state vector, and expected or golden
output vector in test diagnosis literature [91]. In Figure 2.1, the diagnosis vector v contains the
input vector {I1, I2, I3, I4} = {0, 1, 0, 0}, the golden output vector {O1} = {1}, and an empty
initial state vector. Note, that the erroneous circuit response for v can be obtained by simply
simulating the faulty design. When multiple diagnosis vectors are available, a capital letter
V denotes the set of these vectors. In the next section, the automated debugging problem is
formally introduced. This problem formulation uses the information encapsulated in the set of
diagnosis vectors V .
2.3 Automated Debugging
Debugging is a widely used term with different interpretations for the different phases of the
VLSI design cycle. In this dissertation, we follow the terminology used predominantly in func-
tional design debugging and in fault diagnosis literature [1, 91].
Definition 3 Debugging is the process of identifying the source(s) of failures in a given erro-
neous circuit.
The sources of errors in the above definition can be a set of logic gates, logic functions,
entire design modules, or specific RTL lines (VHDL, Verilog, etc) depending on the design
model used. In this dissertation, we use the term component to indicate any of these error
candidates. Formally, the input and output of an automated debugger are defined as follows.
Definition 4 A component with a corresponding output function f is a suspect if and only
if there exists a new function g that can replace f and remove the failure for a given set of
diagnosis vectors V .
Definition 5 Given an erroneous design C with a corresponding set of diagnosis vectors V
that detect a failure, automated design debugging is a process that returns suspect components
in the design.
A set of suspects are also referred to as debugger solutions. In this thesis, for a given
component, a function f produces a sequence of Boolean values for the its output, based on
Chapter 2. Background 17
Pass
No
Yes
Verification tool(simulation)
Circuit C
Debugging tool
Error Source
Done
CorrectInitialstates values
Stimulusvalues
Vector set V
(Suspects)
Figure 2.2: Debugging in a modern simulation-based verification flow
its inputs and initial state. For components with multiple output signals, the function set F
comprises of the functions for each individual output signal.
The above definition for automated debugging, mirrors the manual debugging process per-
formed by design and verification engineers today. In detail, engineers use the verification
failure and information from the diagnosis vectors V to search for locations in the design where
a design modification such as a gate or function change can remove the failure. They do this by
manually tracing the design simulated under the input test vectors V using dedicated waveform
viewers and structural RTL editors.
Figure 2.2 shows where a manual or automated debugging tool fits within a simulation-based
verification flow. The diagnosis vectors V is captured from the verification tools as described
in Section 2.2. In general, the more vectors (i.e. the larger the cardinality |V |) returned by
verification, the better the resolution of the solutions from debugging is expected to be (i.e.
fewer solutions, more precise localization, etc). Once an automated debugger returns suspects
at the RTL or at the gate-level, the designer must devise a fix for the design at those suspect
locations to actually remove the failure for the given vectors. It should be emphasized that
once a rectification is made, the design must be thoroughly verified since the fix may result in
further functional failures.
Chapter 2. Background 18
2.3.1 Complexity of Automated Debugging
This section introduces nomenclature specific to the complexity of the automated debugging
problem. According to Definition 5 there may exist more than a single unique suspect compo-
nent whose function can be modified to rectify the failure for the given set of vectors. Every
possible suspect found by an automated debugger corresponds to a location in the RTL or
gate-level design where the failure can be fixed. For example consider the circuit in Figure 2.1,
for the single vector, one suspect can be gate D because changing it from an AND to an NAND
will remove the failure for the given input vector. Similarly, gate A is another viable suspect
because, it can be shown, changing its function from an AND gate to an OR will also remove the
failure.
The existence of more than one solutions to a debugging problem arises from the fact that
there may be more than one way to re-synthesize the design to correct it. In practice, the
designer may prefer one suspect over another in order to perform a correction. The preference
can be due to reasons such as the ease of implementation of a correcting function or due to the
effect of the correction on circuit performance such as timing, power, etc. In order to provide
many correction choices to the designer, an effective automated debugger must return all the
possible suspects. In this manner, the designer can have a larger set of options to evaluate for
suitability when devising a correction.
Automated debugging as presented in Definition 5 does not limit the number of components
that are returned as a single solution. For example in Figure 2.1 the solutions {A} and {D} are
valid solutions but so is the solution set {B, C}. In other words, simultaneously changing the
functions of gates B and C to NOR gates can rectify the problem. In practice, solutions with
fewer suspect components are preferred because fewer components must be modified in order
to fix a failure.
Definition 6 The error cardinality, denoted as N , is the number of distinct suspect components
contained in a solution set returned by an automated debugger.
It has been shown in [99] that the complexity of the debugging problem increases expo-
nentially with respect to error cardinality N . To reduce the effect of the complexity growth,
Chapter 2. Background 19
a debugger may begin with an initial guess N = 1 and increase N until N = maxN . In this
definition, maxN is a user-defined maximum number of errors the debugger must consider at
once. This approach ensures that potential error sites are searched quickly with small values
of N . Only when a suitable correction location is not found, the value of N is increased. Typ-
ically, given a value for N , most automated debuggers do not return solution sets that contain
suspects found at cardinalities less than N . For example, in Figure 2.1 the solution set {A, C} is
typically not returned with N = 2 because {A} is found with N = 1. We follow this convention
in this dissertation.
Definition 7 Given an error cardinality N , all possible solutions returned by an automated
debugger are referred to as equivalent solutions.
In Figure 2.1, with N = 1, both {A} and {D} are equivalent solutions. An automated de-
bugger is said to be incomplete if all equivalent suspects are not returned for a given cardinality.
In Figure 2.1, with maxN = 4, the complete solution set is {A, D, {B, C}} because there are
no solutions with N = 3 and N = 4.
2.4 Simulation- and BDD-based Diagnosis
Traditionally, diagnosis (debugging) techniques are classified as cause-effect or effect-cause tech-
niques [50]. Cause-effect analysis usually simulates all errors (faults) to compile error (fault)
dictionaries. These dictionaries contain entries of candidate errors (faults) and respective failing
primary output values. Given a failing design (chip) and a set of failing vectors, the design
(chip) responses are found in the dictionary to return a set of suspects for each vector. In
practice, cause-effect techniques are only effective for single stuck-at-fault models and not for
multiple faults [50]. Effect-cause analysis methods, on the other hand, are more scalable as
they use simulation and structural analysis to identify the suspects [2].
Both approaches, return sets of suspects E1, E2, . . . , Ek corresponding to diagnosis vectors
v1, v2, . . . , vk. These sets are intersected (E = E1 ∩ E2 ∩ · · · ∩ Ek) to return the final set E of
suspects that explain the faulty behaviour for all input vectors.
Chapter 2. Background 20
Historically, the first effect-cause diagnosis techniques used simulation and BDDs (symbolic
methods) to find suspects. Symbolic methods [43, 44, 58] operate by building an error equation
that encodes all corrections. Simulation-based algorithms [44, 62] typically use a backtrace
procedure to identify potential suspect locations, and perform simulation to verify that each
suspect can correct the design. For some types of suspects with high fanout, the amount of
simulation required can be excessive [62]. Since the solution space increases exponentially with
the number of suspects, incremental methods have been proposed to explore this search space
efficiently [46, 62]. Such methods examine one location at a time and rely on heuristics to find
the locations, however, they are limited in their effectiveness for industrial problems.
2.5 SAT-Based Debugging
More recently, a novel effect-cause debugging technique was proposed based on the concept of
Boolean Satisfiability [92]. In this context, the debugging problem is formulated as a Boolean
Satisfiability instance where a conventional SAT solver can be utilized to return solutions cor-
responding to suspects. In recent years, a variety of SAT-based debugging formulations have
been proposed building on the initial work of [92]. These more advanced formulations extend
the solution from the combinational problem to the sequential one, they enhance it for hier-
archical design structures and they introduce memory and performance improvements [3, 35].
Experiments show that SAT-based debugging techniques can outperform traditional debugging
techniques (Section 2.4) sometimes by orders of magnitude [33, 66, 91]
We now present the basic formulation of SAT-based debugging since it is relevant to the
contributions of this dissertation. The proposed methodologies introduced in this dissertation
apply to other debugging techniques as well from Section 2.4. However, for ease of presentation,
SAT-based debugging is often used in examples. In the following discussion, we use SAT solvers
as black box engines that return solutions to problems in Conjunctive Normal Form (CNF).
Section 2.6 provides a background of Boolean satisfiability and SAT solvers.
Almost all SAT-based debugging methods, start by synthesizing the design under verification
(erroneous circuit) at the RTL or gate-level into Boolean primitives modeling combinatorial logic
Chapter 2. Background 21
and state elements. The following four steps are then applied to create and solve the debugging
problem.
1. Add extra logic to the erroneous circuit C to model potential suspects and represent the
error cardinality.
2. Convert the amended circuit into Conjunctive Normal Form (CNF).
3. Replicate and constrain the CNF for every failing vector sequence in v ∈ V and for every
clock cycle in each sequence in v.
4. The final CNF is given to any SAT solver. Solutions returned by this solver indicate
respective suspects in C.
The transformation and amendment to the circuit in Step 1 is very important to the debug-
ging formulation and deserves further clarification. In this step, a correction model, represented
by a multiplexer mi, is added at the output of every gate (or module) li in the circuit C. The
output of the correction model mi is connected to the original fanout of li. In effect, when
the select line si of a multiplexer is inactive (si = 0), the original gate li is connected to mi,
otherwise (when si = 1) a new unconstrained primary input wi is introduced. Eventually, the
CNF variable corresponding to wi will assume the value that corrects the circuit. Figure 2.3
(a) shows a sample circuit while Figure 2.3 (b) illustrates this circuit with correction models
added for every gate and input. Recall that correction models are only added to the output of
components, which can be individual gates, groups of gates, or even high level Verilog modules.
When the problem is converted to CNF in Step 2 and the CNF is provided to a SAT solver
in Step 4, the SAT solver can assign any value {0,1} to the si and wi variables such that the
CNF satisfies the constraints applied by each vector v. To constrain the SAT problem to a
particular error cardinality N , further logic is added to activate at most N select lines. This
logic can be modeled using a counter or sorter network [74]. Thus for N = 1, a single si can be
set at a time to a logic 1 which in turn indicates that li is an error suspect. For higher values of
N , the respective number of multiplexer select lines can be set to a logic 1, a fact that indicates
an error suspect N -tuple. A simple all-solution SAT methodology can be implemented on top
Chapter 2. Background 22
x4
x3x2 l1
l3
l2
2y
y1
x1
0
1
0
1w1
s1m1
l3
l22s
m2 y1
2y
w2l1
0
1
m33s
w3
0
1
0
1
0
1
0
1
x4
x2
x3
4s
w4
m4
m55s
56s
w67s
w7m7
m6w
1x
(a) (b)
Figure 2.3: Circuit before and after adding correction models
to return all equivalent error suspects, as described in [91]. Intuitively, an all-solution SAT
solver does not return one satisfying assignment for the input problem but all assignments that
satisfy it. Iterations of all-solution problems can be formulated from N = 1 to N = maxN to
locate all possible errors with at most cardinality maxN .
Step 2, which translates the problem into CNF is not specific to the debugging problem and
described in detail Section 2.6.1. Similarly basic concepts on SAT solvers used in Step 4 of the
debugging algorithm are also illustrated in Section 2.6.2 that follows.
2.6 Boolean Satisfiability
Boolean Satisfiability (SAT) solvers are decision engines used in a variety of scientific domains
such as artificial intelligence, CAD for VLSI, finance and biology, among others. This sec-
tion introduces the satisfiability problem formally and proceeds to provide an overview of the
algorithms utilized by modern SAT solvers.
Definition 8 Given a Boolean formula Φ with operators · (and), + (or), − (negation), →
(implication), ↔ (if-and-only-if) and n variables v1, v2, ..., vn, the Boolean satisfiability problem
is to determine whether there exists a variable assignment from the Boolean domain {0, 1} to
each variable such that Φ evaluates to 1. If such an assignment exists, then the problem is
SATISFIABLE otherwise the problem is UNSATISFIABLE.
Chapter 2. Background 23
For example, determining whether the following Boolean formula evaluates to 1 is a satisfi-
ability problem:
(a + b) + ((a → c) · (b + c))
In this case, the problem is satisfiable and one suitable logic assignment to the variables is
a = 1, b = 0, c = 0. It can be seen that this assignment makes the formula satisfiable.
It is beneficial to briefly discuss Boolean satisfiability in the context of complexity theory.
Boolean satisfiability holds a special place in the history of mathematics as it is the first problem
to be classified as NP-Complete [25]. NP-Complete problems are NP-hard by definition [26] and
their solutions can be verified in polynomial time (they belong to the class NP). A problem is
NP-hard if all other problems in NP can be polynomially reduced to it. Both NP-Complete and
NP-hard problems are classes of problems for which no polynomial-time solving algorithm is
currently known. Unfortunately, many VLSI CAD problems such automated debugging, circuit
optimization and model checking all fall under either of these two classes [88].
One interesting aspect of NP-Complete problems is that if an efficient algorithm is found to
solve one particular NP-Complete problem, then all other such problems can be solved using
the same algorithm. Since the SAT problem is itself NP-Complete, an efficient SAT solver
can be used to solve other problems that can be formulated as a SAT problem. With recent
improvements in SAT algorithms, this approach has become very common. For example, SAT
solvers are used as the underlying engines to return solutions to many VLSI CAD problems
such ATPG [56], formal verification [9], low-power [67] and FPGA routing [75], among others.
In other words, instead of dedicated algorithms to solve all these VLSI CAD problems, they
can be translated to SAT instances where a generic solver can provide answers efficiently.
Although empirically effective for many problems, the SAT problem remains in theory NP-
Complete and thus has a worst-case exponential time complexity unless P=NP. In this case,
complexity class P is the set of all problems that can be solved in polynomial time. Efficient
heuristics are required to reduce the run-times and memory requirements of these algorithms
in practice; however, these heuristics are not beneficial for all problems and do not affect the
worst case behaviour.
Chapter 2. Background 24
2.6.1 CNF Representation for Boolean SAT
The SAT problem, as defined in the previous section, allows for a fairly flexible representation
of the Boolean formula Φ. However, most SAT algorithms work on a more simplified format
namely the Conjunctive Normal Form (CNF). The CNF representation of a SAT instance is a
conjunction (·) of disjunctions (+). In more detail, a CNF formula is composed of a conjunction
of clauses. A clause is composed of a disjunction of literals. A literal represents the positive
phase or negative (−) phase (inversion) of a variable. The CNF representation is popular among
SAT algorithms because they can focus on satisfying each and every clause (evaluate to 1) to
satisfy the overall problem.
When writing a formula in CNF, literals in a clause are grouped together in parentheses
and the conjunction operator is usually omitted. For instance, the following CNF formula Φ
contains two clauses, three variables (a, b, c), and five literals (a, a, b, b, c):
Φ = (a + b + c)(a + b)
Many CAD SAT problems are derived from their gate-level circuit representation and ad-
ditional problem constraints. The circuit component can be represented in CNF by using a
linear-time procedure such as the one described below [56, 79, 97].
step 1. Given a circuit, uniquely label every circuit line including all inputs
and all outputs.
step 2. For each gate, retrieve the corresponding CNF clauses representing
the gate from a database such as Table 2.1.
step 3. For each clause, replace the gate’s input and output variables in the
CNF by the appropriate unique labels.
step 4. Join all clauses by using the conjunction operation.
Intuitively, the above process replaces every simple gate in the circuit with its CNF equiv-
alent as shown in Table 2.1. In the final step, the conjunction of all the gate CNFs result in
the overall circuit CNF. Figure 2.4 illustrates a simple gate-level circuit and its corresponding
CNF representation.
Chapter 2. Background 25
Gates CNF
y = AND(x1, x2, ..., xn) (x1 + y) · (x2 + y) · ... · (xn + y)·
(x1 + x2 + ... + xn + y)
y = NAND(x1, x2, ..., xn) (x1 + y) · (x2 + y) · ... · (xn + y)·
(x1 + x2 + ... + xn + y)
y = OR(x1, x2, ..., xn) (x1 + y) · (x2 + y) · ... · (xn + y)·
(x1 + x2 + ... + xn + y)
y = NOR(x1, x2, ..., xn) (x1 + y) · (x2 + y) · ... · (xn + y)·
(x1 + x2 + ... + xn + y)
y = XOR(x1, x2) (x1 + x2 + y) · (x1 + x2 + y)·
(x1 + x2 + y) · (x1 + x2 + y)
y = XNOR(x1, x2) (x1 + x2 + y) · (x1 + x2 + y)·
(x1 + x2 + y) · (x1 + x2 + y)
y = NOT (i) (i + y) · (i + y)
y = BUFFER(i) (i + y) · (i + y)
y = MUX(s, x1, x2) (x1 + y + s) · (x1 + y + s)·
(x2 + y + s) · (x2 + y + s)
Table 2.1: Simple gates and their CNF representation
The above procedures translate a gate-level circuit to CNF. However, often more work is
required to formulate the specific CAD problem. For instance, to constrain any variable a to
Boolean values 1 or 0, the unit literal clause (a) or (a) is used, respectively. As another example,
the procedure described in Section 2.5 formulates a SAT-based debugging problem. In fact,
in that process, the circuit is constrained with the logic values encapsulated in the diagnosis
vector set V using unit literal clauses.
2.6.2 Boolean Satisfiability Algorithms
Modern SAT solvers such as MiniSAT [74], zChaff [73], and Grasp [69] are based on the branch-
and-bound theory of the original DPLL SAT solving algorithm [27]. The major functions of the
Chapter 2. Background 26
d eabc
Φ =
OR gate︷ ︸︸ ︷
(a + d) · (b + d) · (a + b + d) ·
NAND gate︷ ︸︸ ︷
(c + e) · (d + e) · (c + d + e)
Figure 2.4: Example: circuit and CNF representation
DPLL procedure are shown in the algorithm of Figure 2.5. First, the decide function picks a
variable from the CNF problem and assigns it a 0 or 1 value. Function deduce determines what
other assignment are implied based on the previous decision and identifies whether a conflict
has occurred. A conflict occurs when a variable is implied two opposing Boolean values due to
different CNF clauses. If a conflict occurs, the solver must backtrack to a decision level prior
to the conflict. If a conflict is reached at design level 0, then the problem is UNSATISFIABLE,
otherwise, a SATISFIABLE variable assignment is found as the solution to the SAT problem.
Today’s most effective SAT solvers use advanced techniques such as efficient data struc-
tures for Boolean Constraint Propagation and book-keeping, non-chronological backtracking,
conflict based decision making, and solver restarts to explore different portions of the solution
space [69, 73, 74]. These techniques help the SAT solver performance but cannot guarantee a
timely completion of the search since the problem is theoretically NP-Complete. Nevertheless,
extensive empirical results by the research and industrial communities show that SAT solvers
are found to be computationally viable decision engines for many CAD problems [8, 39, 56].
2.6.3 Maximum Satisfiability
A close relative to the SAT problem is the Maximum Satisfiability (max-sat) problem. Max-sat
is an optimization problem that seeks to find an assignment to an UNSATISFIABLE CNF
formula that maximizes the number of satisfied clauses [61, 68]. For example, for the CNF
Φ = (a) · (a) · (b) · (a + b)
one max-sat solution is {(a), (b), (a + b)}.
Chapter 2. Background 27
1: while ( decide() ) do
2: if ( deduce() = conflict ) then
3: blevel = analyze conflict()
4: if ( blevel = 0 ) then
5: return UNSATISFIABLE
6: end if
7: backtrack(blevel)
8: end if
9: end while
10: return SATISFIABLE
Figure 2.5: Basic DPLL SAT solving algorithm
While max-sat is concerned with finding a satisfiable set of clauses with maximum cardinal-
ity, this can be generalized to find Maximal Satisfiable Subsets (MSSes). An MSS is a satisfiable
subset of the formula clauses that is maximal in the sense that adding any one of the remaining
clauses it will make it unsatisfiable. Any max-sat solution is of course an MSS, but MSSes can
be different (smaller) sizes as well. For instance in the above example, the following are MSSes.
{(a), (b), (a + b)}
{(a), (b)}
{(a), (a + b)}
The first set is an MSS because with assignments a = 0, b = 0 the problem is satisfiable,
the second and third sets are both satisfiable with assignments a = 1, b = 0.
In this dissertation, the complements of MSSes, sets of clauses whose removal makes the
instance satisfiable, are of interest. Just as an MSS is maximal, its complement is minimal, and
we refer to such a set as a Minimal Correction Set (MCS). For the above sets the corresponding
MCS are {(a)}, {(a), (a+b)}, {(a), (b)}, respectively. Chapter 5 presents a technique that solves
the debugging problem by formulating it as a max-sat problem and seeking MCS sets.
There are many types of max-sat solvers developed by research groups today. Unlike DPLL
algorithms for SAT solvers, it is still not clear which max-sat algorithm is the most effective.
Some max-sat solvers simply build a SAT instance with cardinality constraints and use an
Chapter 2. Background 28
off-the-shelf SAT solver to find solutions to the problem [41]. Others use an approximate SAT
algorithm such as local search [42] to find an initial solution to later refine it [4]. One of the
most effective solvers for debugging problems analyzes unsatisfiable cores directly from the SAT
instance and refines the solution by operating on the unsatisfiable core [68].
Chapter 3
Debugging using Abstraction and
Refinement
3.1 Introduction
One of the main challenges in automated debugging is scaling the algorithms to large industrial
designs. Most state-of-the-art research on this topic present results on benchmark designs with
at most tens of thousands of gates. In contrast, modern design blocks in the industry exceed
the half a million gate mark. Since the complexity of the debugging problem is O(lN ) [33],
where l is the number of gates in the design and N is the error cardinality, large designs can
have a significant impact on debugger run-time performance and memory requirements. One
can claim, this challenge has indeed limited the application of automated debugging techniques
to real industrial designs. This chapter introduces abstraction and refinement techniques for
automated debugging to overcome their current design size limitations. The methodology
proposed in this dissertation allows existing debugging tools to tackle larger designs than ever
possible while providing considerable performance improvements.
For over ten years, abstraction and refinement techniques introduced in formal verification
have dramatically influenced the scalability and applicability of model checking tools [11, 22,
23]. Viewed as one of the major advancements in model checking, countless abstraction and
refinement contributions have been made with significant impact on the industrial verification
29
Chapter 3. Debugging using Abstraction and Refinement 30
community [11, 23, 49, 65]. Essentially, these techniques approximate the model checking
problem in a systematic manner thus allowing conventional model checkers to handle larger and
harder problems than ever possible. Similarly, this chapter establishes a theoretical framework
for debugging via abstraction and refinement and presents evidence attesting to the significant
performance and capacity improvements made available to automated debuggers.
The proposed abstraction-based debugging technique begins by “simplifying” components of
the design according the design structure and a pre-determined abstraction level. Hierarchical
or RTL designs are good candidates for function abstraction where high-level functions are
simplified. Gate-level designs benefit from state abstraction which operates on a flat netlist.
Irrespective of the design structure, a high abstraction level leads to an aggressive algorithm
which can drastically reduce the debugging problem size. However, a small problem size is not
always advantageous since there exists a trade off between the level of abstraction performed
and the number of algorithm iterations required to solve the problem.
Once a design is abstracted, a debugging problem can be formulated and solved using
conventional debugging techniques. The error locations returned, also called error suspects,
can contain valuable information to help the user rectify the design. Furthermore, the error
suspects can also inform us that the solutions may not be complete, or that more error suspects
may exist in the design. This scenario occurs when abstraction aggressively removes part of the
design that may contain error sources. Subsequently, refinement is applied, where components
are selectively re-introduced into the abstracted circuit. In essence, refinement systematically
enriches the abstracted circuit with components until all the error sources are located.
This pairing sequence of debugging and refinement steps is iterated until all solutions are
found. A set of rigorous theorems are presented that guarantee the correctness and completeness
of the proposed methodology. It should be noted, that the proposed methodology is not tied to
any particular debugging technique. Although in this chapter the presentation is conveniently
outlined in terms of SAT-based debugging, other diagnosis methodologies (simulation- and
BDD-based) can utilize the presented theory to benefit as well.
In further detail, this chapter introduces a novel debugging abstraction/refinement frame-
work with two orthogonal abstraction techniques each focusing on different design structures.
Chapter 3. Debugging using Abstraction and Refinement 31
1. State abstraction: abstraction is performed on state and memory elements to reduce
the spatial and temporal size of the problem [86]. State abstraction operates on a flat
gate-level representation of the problem and does not require any hierarchical or module
information.
2. Function abstraction: abstraction is performed on functions and modules in an iterative
manner through the design hierarchy thus reducing the size of the problem. Function
abstraction leverages information embedded in high level or RTL designs to provide sig-
nificant benefits.
Extensive experiments on large industrial problems demonstrate a memory and run-time
reduction of 60% and 4.5× using state abstraction, respectively. With function abstraction
drastic memory reduction of over 27× and run-time reductions of over two orders of magnitude
are observed. Just like in verification, this chapter demonstrates that abstraction and refinement
have a critical impact on the performance of automated debugging. They also motivate future
research in the field.
This presentation of material is organized as follows. In the next section, notation and
background material are presented. Section 3.3 present the general abstraction and refinement
methodology and guarantees its correctness and completeness. Sections 3.4 and 3.5 present the
details of state-based and function-based abstraction techniques, respectively. Empirical results
are presented in Section 3.6 and conclusion and future work are discussed in Section 3.7.
3.2 Preliminaries
A design or circuit C can be represented at the gate-level or the Register Transfer Level (RTL).
At the RTL level, the circuit is often hierarchically composed of modules or functions. In
this chapter, a function is said to generate a Boolean value for a variable y based on m input
variables x1, x2, ..., xm and zero or more state variables. For abstraction and refinement, we
are primarily concerned with the structural connectivity between the input variables and the
variable y of a function. As a result, the dependence of the function on state variables is omitted
in the following. The terms modules, components and functions are used interchangeably to
Chapter 3. Debugging using Abstraction and Refinement 32
refer to entities implementing functions as defined above. For example, a Verilog function or
a collection of RTL statements can define a module. Each module implements a multi-output
function F = {f1(X), f2(X), ..., fp(X)} where each single-output function fi is defined on input
variables X = {x1, x2, ..., xq}. In the remaining, single output functions and multi-output
functions are not distinguished unless explicitly stated otherwise.
Modules can also contain sub-modules thus resulting in a hierarchy tree H for the design. A
hierarchy tree H contains nodes representing modules and edges representing parent and child
(sub-module) relationships. The hierarchy tree H can contain many levels, thus each function
is labelled with a superscript that indicates its level. For example a function F ij is at level i of
the tree and it can have sub-functions F i+1k and F i+1
l at the next level i+1. The output of the
entire design C is represented by F 01 at root level 0. This module and hierarchy terminology is
used extensively in Section 3.5 when introducing function abstraction.
3.2.1 Abstraction and Refinement in Model Checking
Abstraction and refinement techniques are used readily in model checking to mitigate the expo-
nential nature of the underlying state space [11, 22, 23]. Roughly speaking, an abstract model is
derived by removing state elements or other components from the original concrete design. As
an active area of research, many different types of abstraction techniques exist such as existen-
tial abstraction and predicate abstraction [24, 40, 49]. Irrespective of the abstraction approach,
the final abstract model contains fewer circuit elements than the original thus simplifying the
task of the model checker.
Depending on the properties being verified and the abstraction technique used, the model
checking result may or may not be trusted. For example, consider the scenario when verifying
a universal property (whether the property holds for all paths) using an existential abstraction
technique [19]. If model checking determines that a property holds in the abstract model, then
it must also hold in the concrete design [19]. However, if a property does not hold in the abstract
model, then the corresponding counter-example must be validated in the concrete design. If
the counter-example does not expose a failure of the property in the concrete design it is said to
be spurious [24]. In this case, the abstract model is refined by reverting some of the abstracted
Chapter 3. Debugging using Abstraction and Refinement 33
state elements and continuing the model checking process.
3.3 Debugging with Abstraction and Refinement
The aim of abstraction-based debugging is to reduce the size and complexity of the underlying
problem. Since the performance of a debugger and its memory requirements are directly re-
lated to the size of the circuit under analysis, abstraction can introduce considerable run-time
and memory benefits. This section introduces the basics of a complete and sound debugging
methodology using abstraction and refinement.
An abstract model C ′ is derived by removing a set of components or functions Abs from a
concrete model C. More precisely, as shown below, the procedure selAbsComponents selects a
set of functions to abstract while the procedure absDesign removes these from the design C:
Abs = selAbsComponents(C)
C ′ = absDesign(Abs, C)
When the components Abs are removed, some of the circuitry in their transitive fanin
may be left dangling (i.e. unused by any other logic). An iterative dangling logic removal
procedure can eliminate all gates, wires and other state elements unused by the abstracted
components [13]. The resulting model C ′ can be significantly smaller than C. For instance, if
Abs includes all primary outputs of a circuit, the entire circuit can be essentially removed. The
degree of abstraction to perform is addressed through the experiments of Section 5.6.
After removing the components Abs, their direct fanouts, which are now undriven, are
connected to newly introduced primary inputs. Specifically, for every function fi ∈ Abs a new
primary input is introduced in C ′ and connected to the fanout of fi. As an example, consider
Figure 3.1 (a) and (b) where a circuit is shown before and after abstraction, respectively. In
this case, the component to abstract is Abs = {q1}. Notice that q1 and its transitive fanin
logic, l6 and x3 are removed in Figure 3.1 (b) and the fanout of q1, l2 is now driven by the new
primary input x5.
Chapter 3. Debugging using Abstraction and Refinement 34
2x3x4
l4
l3
l2
l61x l1
2q
q
x
1
l5
2
1
y
y
Q
DQ
D
l2
l5
x
l4
l2
1
x4
x5
x
2q
2y
2
3l
1l
1y
Q D
(a) (b)
Figure 3.1: Circuit before and after abstracting flip-flop q1
3.3.1 Guaranteeing Correctness
Once the abstract model C ′ is generated, the next step is to construct the debugging problem.
The natural reaction is to formulate a SAT-based debugging problem according to Section 2.5
with the abstract model C ′ and the error trace V . However, the abstract model C ′ contains
the newly added primary inputs, which remain unconstrained in V . As a result, a SAT-based
debugging engine may arbitrary assign unjustifiable logic values to these variables while solving
the debugging problem.
Definition 9 Assume that Φ (Φ′) corresponds to a SAT-based debugging problem derived from a
concrete design C (abstract design C ′). A value assignment to the variables of Φ′ is unjustifiable
if the same assignment to Φ gives a conflict.
Here, a conflict occurs when different values from the Boolean domain are assigned to the
same variable. Consequently the solutions returned by a debugger in this formulation cannot
be trusted as it may be incorrect. The following example illustrates this particular scenario.
Example 1 Figure 3.2 (a) shows a concrete design with an error on gate l1. Regardless of
the error type, the correct/erroneous values of logic 1/0, shown in bold, propagate from gate
Chapter 3. Debugging using Abstraction and Refinement 35
3l
2l
4l
4x3x2x
1y
y2
5l
1x1 l6
l
2q
q1
D
Q D
Q
0/0
DQ
1
DQ
1
00/0
1
1/01/0
1/01/0
5l
2l
3l
1l
1y
y
q2
1
4l
x
2x4
x5
x2
1/11
00/0
0/0
11/0
1
DQ DQ
(a) (b)
Figure 3.2: Demonstrating the effect of unconstrained inputs on abstract circuit
l1 through the flip-flop q1 and to the primary output y1. Notice that the primary input values
remain constant in both time frames. When the state element q1 is abstracted and left un-
constrained, the SAT solver can assign this new input x5 to a value 1 which will produce the
correct/erroneous value pair 1/1, also shown in Figure 3.2 (b). Here, the value assignment
of x5 = 1 is unjustifiable because in the concrete design of Figure 3.2 (a) the corresponding
assignment to q1 is 0.
One way to prohibit unjustifiable solutions from occurring is to constrain the newly added
primary inputs to the values of the Abs components in design C as proposed by Theorem 1.
Theorem 1 Given a circuit C and an input vector sequence from v, the set Q contains the
simulation values of the output of components Abs for all clock cycles in v. A debugging problem
formulated with abstract model C ′ and v′ = v ∪ Q will not have any unjustifiable assignments.
Proof: This proof is based on the fact that the abstract model can be restricted sequentially to
behave like the concrete model. For every clock cycle, the fanout logic of every Abs component
in C is driven by circuit elements whose Boolean values are stored Q. Similarly, the Boolean
values in Q are used to drive the new primary inputs in C ′ for every clock cycle. Since the
Chapter 3. Debugging using Abstraction and Refinement 36
fanout logic of every Abs component in C ′ is constrained to the same values as in C, unjustifiable
assignments will not occur.
By Theorem 1, correctness is guaranteed since all solutions for the abstract model are also
solutions for the concrete design. Next, the proposed methodology is extended to find suspects
that may be accidentally abstracted.
3.3.2 Spurious Solutions
Since abstraction may remove large sections of a design, it is possible that error sources are
accidentally removed. This case will be identified as automated debuggers will return the new
primary input suspects as spurious solutions.
Definition 10 Spurious solutions are primary input suspects returned by an automated debug-
ger that correspond to abstracted components.
Spurious solutions do not provide enough information about the error source to help rectify
the erroneous concrete design. In other words, these spurious solutions mask equivalent error
locations. To find the error locations in the concrete design, the abstracted variables and their
respective removed fanin logic must be analyzed. One way is to refine the design based on the
spurious solutions and iterate the debugging process. Refinement is achieved by re-introducing
the original circuitry, including the removed fanin logic, corresponding to the spurious solutions
into the design C ′.
Example 2 Consider the circuit in Figure 3.2 (a) after abstracting q1, where a debugger finds
l2, l1, and x5 as suspects with N = 1. Here, location l6, which is the error source in the
concrete design, is abstracted. In this case, the spurious solution x5 masks the error source l6.
Refinement is necessary to re-introduce l6 into C ′, thus allowing the debugger to find l6 in the
next iteration.
Traditionally, complete solutions that return all equivalent solutions are important in de-
bugging [91] since they offer more degrees of flexibility for the designer to correct the design or
optimize it, if a debug-based rewiring algorithm is used [98]. To find all equivalent suspects,
Chapter 3. Debugging using Abstraction and Refinement 37
2l
5
5x
4l
2l
l
1/01/01
3l
12x
5
1 6
x
4x
x1l
l
1/0
1/0
5l
3l
1
1y
l
1/0
1
l4
1
2y
1
Figure 3.3: Abstract circuit unfolded over two time frames
all solutions corresponding to abstracted components must be refined, a process performed it-
eratively until no more solutions from abstracted components are found. In practice, since the
proposed process is incremental, the user at any time can attempt to rectify the circuit before
the entire debugging process is complete.
3.3.3 Guaranteeing Completeness
The abstraction formulation and refinement schemes discussed in the previous sections provide
a means of identifying error sources without considering the entire design. However, under
certain conditions some equivalent solutions may be missed by the debuggers. This happens
when a set of m errors in the concrete design are mapped onto a set of n errors in the abstract
model, where n > m, as shown in the Example 3.
Example 3 Consider the abstract circuit in Figure 3.1 (b) unfolded for two time frames as
illustrated in Figure 3.3. For clarity, the abstracted logic l6 is shown in dashed lines. Notice
that the error from gate l1 does not directly propagate to output y1 but its effect is captured in
abstract variable x5. For error cardinality N = 1, the SAT solver returns the single equivalent
error location l2. Assuming that the design is analyzed and it is concluded that l2 is not the
error source, the real source of error goes undetected. However, if N is incremented to 2, then
the pair {l1, x5} is found as a solution. By refining the abstract variable x5 to q1 and solving
the debugging problem again with N = 1, the single error location l1 is found.
Chapter 3. Debugging using Abstraction and Refinement 38
The above example illustrates how abstraction can cause the error location to be found with
higher error cardinality. Given a maximum user defined error cardinality of maxN , when using
abstraction and refinement, the maximum cardinality should be set to maxNabs = maxN +
|output(Abs)|, where |output(Abs)| is the number of unique outputs for the abstracted functions
(or the number of new primary inputs). Theorem 2, presents steps required to find all equivalent
error locations for a user-specified value of maxN .
Theorem 2 Assume that a debugger returns solution set S for concrete design C, diagnosis
vectors V , and maximum error cardinality maxN . The debugging procedure that performs the
following steps with an abstract model C ′, diagnosis vectors V ′, and maximum error cardinality
maxNabs = maxN + |output(Abs)| finds set of solutions S′ ⊇ S.
1. Initialize N to 1
2. Debug C ′ with V ′ and N to get solution set S′.
3. If any solutions s ∈ S′ are spurious, refine the abstract model C ′ using s, go to (1)
4. Increment N by 1
5. If N > maxNabs return S′, else go to (1)
Proof: In the worst case, some error sources are abstracted and their behavior is captured
by the new primary inputs or output(Abs). Together, the maximum number of active error
locations is maxNabs = maxN + |output(Abs)|. The debugger proceeds to find solutions based
on the abstract model using N ≤ maxNabs. If any of the solutions are spurious, then the
abstract model is refined and those variables are replaced with their corresponding concrete
components. The new abstract model is then given to the tool which starts the search with
N = 1 again. The search continues until N = maxNabs, and all the equivalent errors that
map into maxNabs-tuples or fewer will be found. After every refinement step, some abstracted
components are re-introduced and previous solutions at N = maxNabs may be found at N ≤
maxNabs. This process guarantees that all the abstracted components that mask error locations
are systematically resolved thus finding all the solutions S.
Chapter 3. Debugging using Abstraction and Refinement 39
1: S = ∅, N = 12: Abs = selAbsComponents(C)3: C′ = absDesign(Abs,C)4: maxNabs = maxN + |outputs(Abs)|5: while (1) do
6: V ′= extract constraint(C, C′, V )7: New sols = debug(C′, V ′, N)8: for all Sol ∈ New sols do
9: if (spurious solutions(Sol, C′)) then
10: C′ =refine(Sol, C′, C)11: N = 012: maxNabs = maxN + |outputs(C′, C)|13: else
14: S = S ∪ Sol
15: end if
16: end for
17: N = N + 118: if (N > maxNabs) then
19: return {S, C′}20: end if
21: end while
Figure 3.4: Debugging algorithm with state abstraction and refinement
3.3.4 Overall Algorithm
Figure 3.4 illustrates the overall abstraction and refinement algorithm for a debugging method-
ology that guarantees correctness and completeness. The first step is to generate the initial
abstract model C ′ as shown on lines 2 and 3. To ensure correctness, on line 6, the stimu-
lus is modified to constrain the new primary inputs to their simulation values as discussed in
Section 3.3.1. The modified diagnosis vector V ′ and abstract model C ′ are provided to the de-
bugger to find the error locations as shown on line 7. Next, according to the spurious solutions,
refinement may be performed, the error cardinality is reset, and maxNabs is recalculated. If
the solutions are not spurious, they are added to the solutions set S to be returned to the user.
The above steps are repeated until maxNabs is reached for completeness.
Even though the final solution S is returned on line 19, the algorithm is incremental in nature
meaning that every solution found can be provided to the user. The benefit of an incremental
algorithm is that suspects can be analyzed by engineers prior to all equivalent solutions being
found.
Chapter 3. Debugging using Abstraction and Refinement 40
3.4 State Abstraction
State abstraction is one type of abstraction where memory elements such flip-flops and latches
are selected for removal. This approach can be powerful because state elements are important
components that play a central role in both state machine and datapath logic. Furthermore,
when modular or hierarchical information is not available for a design, as is the case for post-
synthesis netlist or custom logic, state abstraction can operate on the flat design.
The effectiveness of state abstraction is demonstrated empirically in Section 3.6.1. A subtle
benefit of state abstraction is that with a reduced state space, debug traces can be considerably
shortened. This advantage is discussed next.
3.4.1 Trace Length Reduction Benefits
Long error trace lengths are commonly associated with simulation-based verification tools where
random and constrained-random stimulus is used to exercise the design. Both manual and
automated debugging can benefit from operating on shorter error traces. Trace reduction is an
effective pre-process to debugging as it can reduce trace lengths by orders of magnitude [17, 20,
78].
State abstraction can help further reduce the trace length prior to debugging. With many
of the state elements abstracted, the state space of the design is reduced thus allowing for
state matching techniques to remove repeated states and redundant transitions [17, 20, 78].
It should be emphasized that most state matching techniques implicitly re-simulate reduced
traces in order to ensure that the desired failure is still exposed.
As an example, consider Figure 3.5 where a state transition diagram is used to illustrate an
error trace from state q0 to qk. In the original trace, no trace reductions are possible through
state matching. However, after the second state element is removed (through abstraction) the
q
1001
1-01 1-10
1010 0010
0-10
qk
1100
1-00
4q
3q
2q
10110
0-100-00
00000
q
Figure 3.5: Reduced trace V ′ due to abstraction
Chapter 3. Debugging using Abstraction and Refinement 41
states q1 and q4 can no longer be differentiated. The state values after abstraction are shown
under each node in Figure 3.5. As a result, a short-cut can be taken in the trace from state q0 to
state q4 as illustrated by the dashed line. Note, as required by most trace reduction techniques,
the compacted traces must be tested to determine whether the error(s) are still observable.
3.5 Function Abstraction
When a design contains high level or RTL information, function abstraction can provide a
natural and powerful way to partition the debugging problem. For instance, designers working
on RTL designs, use modules to partition the design based on functionality and complexity.
These modules are also good candidates for abstraction. Furthermore, the hierarchical and
modular composition of HDL designs can be leveraged to apply abstraction and refinement in
a systematic manner.
3.5.1 Hierarchical abstraction
The strength of function or modular abstraction can be amplified when used in a hierarchical
manner. More specifically, module-based debugging can be applied iteratively at each hierarchy
level thus allowing for a divide and conquer debugging approach.
At each level i of the hierarchy H, the functions {F i1, F
i2, ..., F
ip} can be considered by
the procedure selAbsComponents to select the components to abstract, Absi. The iterative
sequence of abstraction, debugging and refinement presented in algorithm of Figure 3.4 can be
applied to the problem constructed at hierarchy level i. However, only functions at level i can
be refined and not their sub-functions. In order to locate the errors in the sub-functions, the
entire algorithm must be repeated at the hierarchy level i + 1.
Two properties of hierarchical abstraction and refinement are very important. After com-
pleting an iteration of the algorithm in Figure 3.4 at hierarchical level i, then,
1. if a function f i is still abstracted, then its sub-functions gj can be abstracted at hierarchy
levels > i.
2. if a function f i is refined, then its sub-functions at gj may still be abstracted at hierarchy
levels > i.
Chapter 3. Debugging using Abstraction and Refinement 42
X1
X2
Y1
Y2
F 11
BugF 2
2
F 23
Figure 3.6: Function F 11 is composed of functions F 2
2 and F 23
The first observation is easy to confirm. When a function is still abstracted after debugging,
it signifies that equivalent error locations do not reside inside it. Similarly, the sub-functions
will not contain any equivalent error locations either, and they should be abstracted at deeper
hierarchy levels.
For the second observation, consider Figure 3.6 where an error resides in F 22 . At level 1,
function F 11 cannot be abstracted since it contains the error. However, at level 2 sub-function
F 23 may be abstracted since it is independent from F 2
2 and its output. Thus, functions can
be partitioned into sub-functions such that some of the sub-functions will not contain any
equivalent error locations.
3.5.2 Overall Algorithm
To reduce the debugging problem size further, when operating at a given hierarchy level i,
all functions at a deeper hierarchy level > i should also be abstracted in C ′. However, it is
important to only refine modules at level i. This restriction reduces the complexity of the
debugging problem at level i and postpones the analysis of the sub-functions at level > i to
future hierarchy levels.
The process of finding all equivalent solutions through the management of the error cardi-
nality is the same with hierarchical abstraction from Section 3.3, also shown below.
Example 4 Consider Figure 3.7(a) where the modules Abs1 = {F 12 , F 1
4 } are abstracted at level
1. The abstraction results in the removal of modules F 11 and F 1
3 as well because they fan-in
to Abs1. The initial abstracted circuit is shown in Figure 3.7(b). Assuming that the error is
Chapter 3. Debugging using Abstraction and Refinement 43
F 27
F 13 Bug
F 25
F 11
Y1
X2
X1
Y2
F 26
F 14
F 12
(a) model C before abstraction
Y1
Y2
XF 12
XF 14
(b) initial model C′ at level 1
Y1
X2
XF 11
Y2
F 14
F 12
F 13 Bug
(c) final model C′ at level 1
Y1
X2
XF 11
Y2
F 14
F 12
F 27
F 13 Bug
(d) final model C′ at level 2
Figure 3.7: Hierarchical abstraction and refinement example
in module F 27 , the error effect can propagate to the output of Y1 and Y2. In this example, the
debugger will not identify a single error source, but will find the error pair of {XF 1
2
, XF 1
4
} with
N = 2. Through refinement, these modules and their fanin circuitry are re-introduced in the
circuit as shown in Figure 3.7(c). Next, the error cardinality N must be reset to 1. At hierarchy
level 2, the modules F 27 and F 2
6 can be abstracted as part of Abs2. Refinement will re-introduce
module F 27 and debugging will find the error source inside it as shown in Figure 3.7(d).
The proposed hierarchical abstraction and refinement algorithm is shown in Figure 3.8. Here
the debugging problem is solved iteratively by descending the hierarchy H. At each hierarchy
level i, the procedure absDesign first abstracts all functions at levels > i. This ensures that
sub-functions will not be refined. Next, function abstraction and refinement is performed by
Function debug according to the algorithm of Figure 3.4. The effectiveness of the proposed
technique is demonstrated in the experiments of Section 5.6.
Chapter 3. Debugging using Abstraction and Refinement 44
1: Solutions = ∅, level = 0, N = 1, C′ = C
2: while (1) do
3: level = level + 14: C′ = absDesign(level + 1, C′)5: {New sols, C′} = Function debug(C′, level, N)6: if New sols = ∅ then
7: return Solutions
8: else
9: Solutions = Solutions ∪ New sols
10: end if
11: end while
Figure 3.8: Hierarchical debugging algorithm
3.6 Experiments
This section evaluates the effectiveness of the proposed abstraction and refinement debugging
methodology. First, state abstraction is applied to gate-level diagnosis problems. The second
set of experiments are conducted on RTL designs that are developed in a hierarchical manner.
For those circuits, function and hierarchical abstraction/refinement are used.
3.6.1 State Abstraction
To evaluate the effectiveness of state abstraction, hand-made bugs are inserted in circuits
from the ISCAS’89 and ITC’99 benchmarks as well as industrial RTL circuits from Open-
Cores.org [77]. The bugs are single gate changes or signal RTL assignment changes made at
random. For each erroneous circuit, 10 traces are obtained through pseudo-random simulation
that demonstrate the erroneous behaviour with respect to the reference models. These traces
are used by a sequential SAT-based debugger similar to [91] to locate the error sites. In the
proposed abstraction and refinement procedures of Section 3.4, the design and traces are modi-
fied from C and V to C ′ and V ′, respectively, before the debugging engine is called. Comparing
the performance of the debugger with and without the proposed techniques provides a fair
evaluation.
The experiments are conducted on a 2.66GHz Intel Xeon processor with 2 GB of memory and
a timeout of 7200 seconds for each problem. For each problem, a trace compaction procedure
is performed before debugging. This process reduces the length of the counter-example when
possible. This procedure first builds a graph of the visited states, it then connects edges between
repeated states and applies Dijkstra’s shortest path algorithm from the initial state to the final
Chapter 3. Debugging using Abstraction and Refinement 45
circuits # gates # FF # clk # red. clk # cls (K) mem (MB) time/err (s) # err total (s)b04 711 66 516 335 2422 1132 740.0 9 6660.0b08 200 21 21 20 274 82 3.8 4 15.2b12 1140 121 40 19 1492 449 165.9 5 829.5b14 6028 245 54 54 mem out > 2000 - - -
s1488 693 6 104 5 214 42 1.6 9 14.4s5378 3222 179 3 3 554 105 13.1 3 39.3s13207 9442 669 2 2 1415 227 70.1 9 630.9s35932 21147 1728 75 8 3563 696 431.1 16 6897.6
div su 1528 126 9 6 607 109 12.4 64 793.6rsdecoder 10629 521 2 2 2043 301 120.1 9 1080.9spi 2027 90 20 18 2763 582 391.3 3 1173.9ac97 15166 1452 30 30 mem out > 2000 - - -
Table 3.1: Statistics for problems and the stand-alone SAT-based debugging approach
state [26]. More powerful trace compaction schemes may provide better results [17, 20]. The
resulting traces that do not distinguish the reference and buggy circuits are discarded.
In Section 3.4 the effects of abstraction on logic size and trace length were discussed.
Figure 3.9 summarizes this effect empirically on two designs, b04 and b14. Figure 3.9 (a)
demonstrates an apparently linear relationship between logic size reductions to the number of
abstracted state elements. In Figure 3.9 (b), experiments show that significant trace length
reductions are possible only after a certain threshold is reached. This threshold appears to be
over 50% for b04 and over 70% for b14. Thus for large problems where memory is a major
concern, a more aggressive approach, where over 70% of state elements are abstracted, may be
desirable.
b04b14
20
40
60
80
100
30 50 70 90
% state elements abstracted
% lo
gic
size
red
uctio
n
b04b14
20
40
60
80
30 50 70 90
100
% tr
ace
leng
th r
educ
tion
% state elements abstracted
(2) (b)
Figure 3.9: Logic and trace reduction vs. flip-flops abstracted
Chapter 3. Debugging using Abstraction and Refinement 46
circuits red. logic(%) red. FF(%) red. trace(%) red. mem(%) time/err (s) # err maxN prev (s) refine (s) total (s) X impr.b04 20.5 45.4 0 9.8 530.0 12 3 11.0 0 6371 1.04
b08 26.0 47.6 65.0 60.0 0.2 12 3 0.1 0 3.35 4.53
b12 26.4 41.3 15.7 24.9 85.0 20 3 4.2 0 1704.2 0.48
b14 15.3 40.8 0 > 46.0 3740.2 2 2 42.0 0 7522.4 -
s1488 20.4 50.0 0 11.9 1.1 9 1 0 0 9.9 1.45
s5378 9.7 44.6 0 37.1 11.8 1 1 0 3.4 15.2 2.58
s13207 29.6 44.8 0 31.7 40.3 9 1 0 0 362.7 1.73
s35932 31.9 46.2 0 34.9 251.3 16 2 7.3 0 4028.1 1.71
div su 34.0 39.6 0 9.5 5.9 32 3 2.2 396.8 587.8 1.35
rsdecoder 34.7 43.1 0 22.9 54.8 7 1 0 0 383.6 2.81
spi 37.6 44.4 22.2 46.0 101.2 1 1 0 303.6 404.9 2.89
ac97 41.2 48.2 0 > 37.0 365.6 2 1 0 0 731.2 -
Table 3.2: Performance statistics for abstraction and refinement debugging framework
Table 3.1 presents a summary of the debugging problems used as well as performance
statistics when debugging the concrete circuits. Later, these results are contrasted with those
of the proposed abstraction and refinement framework. Columns 1, 2 and 3 present the circuit
name, number of gates, and number of flip-flops (state elements) in each circuit. Columns #
clk and # red. clk show the average length of the traces before and after the trace compaction,
respectively.
The next five columns summarize the results of the debugger for each problem. In Columns
# cls and mem, the number of clauses (in thousands) generated for each problem and the
debugger’s memory usage is presented. The number of equivalent errors found by the debugger
for the given vectors as well as the average time required to find them are presented in columns #
err and time/err, respectively. Finally, the total time required to find all the errors is presented
in column total.
To cope with the size of the larger problems the CNFs are partitioned into bands and solved
sequentially as described in [91]. For b14 and ac97 where the average reduced traces are 54
time frames and 30 time frames long, the problems still run out of memory. The proposed
abstraction framework is most beneficial for such memory intensive problems.
Table 3.2 presents the results of the proposed abstraction and refinement debugging frame-
work. For each problem, a random abstraction function is used such that between 40-50% of
the state elements are abstracted, a conservative amount according to Figure 3.9. To allow a
comparison with the data in Table 3.1, the percentage of reduced logic, reduced of flip-flops, ad-
Chapter 3. Debugging using Abstraction and Refinement 47
ditional compacted traces, and overall reduced memory requirements are presented in columns
2-5, respectively. Looking across one row for the problem b08, by abstracting 47.6% of the
flip-flops, the logic is reduced by 26% and the trace length is reduced by an additional 65%
which leads to an overall memory reduction of 60% versus the stand-alone debugger.
The largest problems in Table 3.1 are for circuits b14 and ac97 and they ran out of memory.
With the new methodology, they both successfully complete. It can be calculated, that on the
average, the proposed methodology results in up to 60% memory reduction with average savings
of 30% under a conservative abstraction approach.
The majority of problems in Table 3.2 do not benefit from additional trace compaction.
This can be attributed to the fact that trace reduction is most effective for long traces since the
probability of matching states is higher. In the experiments, the initial trace compaction process
is able to reduce the traces considerably. For instance, the initial trace of circuit s1488 which
is 104 clock cycles is reduced to only 5 clock cycles after compaction, thus further reductions
are highly unlikely. For industrial traces of thousands of clock cycles derived from functional
testbenches and not randomly, it is highly unlikely to reduce traces drastically by simple state
matching techniques [20]. Therefore, trace reduction via abstraction may be more effective.
A summary of the run-time results of the proposed framework is presented in columns 6-12
of Table 3.2. In columns time/err and # err the average time required to find an error and the
number of errors found are presented, respectively. It should be noted that when the number
of errors is greater than those in Table 3.1, it means that abstracted state variables are found
as errors. In these experiments, if all equivalent error tuples are found (including the inserted
errors), then refinement is not performed. In practice, all equivalent errors are not necessary
as only the actual error is must be fixed. If the errors found by the proposed framework do not
include all equivalent error locations (i.e. # err is smaller in Table 3.2 than Table 3.1), then
all spurious solutions must be refined.
In Table 3.2, column maxN shows the maximum number of tuples searched until all equiv-
alent errors are found. The debugging time for all searches prior to maxN is shown in the
column prev. When refinements are necessary, the column refine presents the solve time for all
subsequent refinement searches.
Chapter 3. Debugging using Abstraction and Refinement 48
For many problems in Table 3.2, the maximum error tuple found (maxN) is often greater
than 1 but always less than or equal to 3. The time required to determine that no solutions
exist prior to maxN (prev) is always quite smaller than the average time required to find an
error (time/err). If we take b12 for instance, it takes on average 4.2 seconds to determine that
no errors occur when N < 3 and 85 seconds to find each solution at N = 3. Relating these
times to the algorithm in Figure 3.4, it means that the approach is quite effective since the
majority of the time is spent in the debug function on line 6 when N=maxN and not when
N < maxN.
The total debugging time for the proposed approach is found by summing the product of
time/err and # error with prev and refine. The resulting total run-time is shown in column
total and its improvement over Table 3.1 is shown in column X impr. When abstracting 40-
50% of the state elements, not many refinement steps are necessary as most equivalent error
locations are found in the abstract model. However, even for the cases where refinement is
necessary, substantial run-time improvement is observed. The only problem that demonstrates
a performance decrease is b12 where four times more solutions are found in the abstract model
versus the concrete design. Overall, performance improvements of up to 4.5X are observed with
an average value of 2X across all problems. This increased efficiency can be attributed to the
smaller size of the constraint problems which lead to easier CNFs for the SAT solver.
As observed in Figure 3.9 smaller problem sizes and shorter traces can be achieved with
more aggressive abstraction than those of Table 3.2. To demonstrate the effectiveness of the
framework under a more aggressive abstraction strategy, the two largest problems b14 and ac97
are shown in Tables 3.3 and Tables 3.4 with 80% and 96% of the state elements abstracted. For
easy comparison, the first row of each table re-presents the problem properties of Table 3.2. The
following rows show the results after each abstraction and refinement steps until the specific
injected error is found (not all equivalent errors as in Table 3.1). For each table, column
1 describes whether the data is derived from Table 3.2 (Tbl 3.2), from the initial abstraction
(abs), or from a refinement step (ref). The remaining columns are labeled similarly to Table 3.2.
As expected, when more state variables are abstracted, greater memory savings are attained
and more refinement steps are necessary. However, along with the memory savings, more
Chapter 3. Debugging using Abstraction and Refinement 49
step red. logic(%) red. FF(%) red. trace(%) mem(MB) time/err(s) err
Tbl 3.2 15.4 40.8 0 1080 3740.0 2
abs 52.7 81.6 20.3 344 172.0 4ref 1 50.2 80.8 20.3 378 225.1 3ref 2 50.1 80.4 20.3 404 242.3 10
Table 3.3: Summary of b14 when abstracting over 80% of flip-flops
step red. logic(%) red. FF(%) red. trace(%) mem(MB) time/err(s) err
Tbl 3.2 41.2 48.2 0 1260 1567.8 2
abs 89.7 96.4 33.3 555 365.6 2ref 1 89.5 96.3 33.3 765 665.8 10ref 2 89.4 96.2 33.3 773 664.0 6ref 3 89.1 96.1 33.3 776 721.8 9
Table 3.4: Summary of ac97 when abstracting over 96% of flip-flops
abstracted variables lead to much faster solve times per error. For instance, b14 requires 3740
second per error with 40% state abstraction while it requires only 172 seconds per error with
82% state abstraction.
It is interesting to notice the relatively small number of iterations necessary to find the
injected error. More precisely, b14 and ac97 require only two and three refinement steps,
respectively, before finding the errors. This small number of steps indicates that the appropriate
variables are selected for refinement and that the debugger is guided efficiently towards the
errors after each step.
Overall, the proposed abstraction and refinement debugging framework demonstrates its
effectiveness for large problems where conventional approaches may fail due to excessive memory
and/or run-time requirements.
3.6.2 Function Abstraction
This section presents the experiments for function abstraction. All the circuits used are from the
OpenCores.org website [77] except for an industrial communication design (comm), with nearly
500,000 synthesized gates. Each circuit contains a functional level error such as an incorrect
statement, incorrect module instantiation, bad wiring between modules, etc. These RTL errors
typically represent tens or hundreds of gate-level errors. The debugger used in all experiments is
the module-aware SAT-based automated debugger of [3]. This set of experiments is conducted
Chapter 3. Debugging using Abstraction and Refinement 50
designProblem statistics Stand-alone debugger
size # DFF # clk (used) # literal time (s) mem (M)
wb con1 80695 818 19 (19) 518580 58.74 619wb con3 80695 818 1387 (40) 1273699 205.16 1250fdct1 264221 5461 189 (40) 1705328 555.37 4400mem ctrl1 38660 1145 1318 (40) 3887703 55.13 850vga1 147457 17102 16100 (40) 8679788 1635.78 4700vga2 147457 17102 141 (40) 212588 236.16 1350comm1 449927 30339 19 (25) 1912087 1575.67 5080comm2 453788 26852 88 (25) Mem out Mem out 8000comm3 453576 26852 1387 (25) 277649 809.31 4831
Table 3.5: Summary of problems for function abstraction
design nameabstracted problem stats comparison to original
maxN # itr mod refined/total # literals time (s) peak mem (M) lit reduced (×) speed up (×) mem reduced (×)
wb con1 1 3 3 / 8 115547 25.55 253 4.49 2.30 2.45wb con2 1 4 4 / 8 140713 149.12 469 9.05 1.38 2.67fdct1 1 6 5 / 5 1705328 638.78 4400 1.00 0.87 1.00mem ctrl1 1 4 12 / 14 112581 12.02 200 34.53 4.59 4.25vga1 1 2 5 / 14 13767 6.27 173 630.48 260.89 27.17vga2 1 5 6 / 14 94066 436.38 1052 2.26 0.54 1.28comm1 2 8 10 / 129 37960 108.32 772 50.37 13.11 6.58comm2 1 9 10 / 129 25105 1403.47 640 — — > 12.50comm3 2 8 8 / 129 80103 63.94 317 3.47 12.66 15.24
Table 3.6: Results of proposed function abstraction and refinement technique
on a 2.66 GHz 64 bit Intel Core 2 Quad processor with 8GB of memory.
Table 3.5 presents a summary of the debugging problems and the corresponding automated
debugger statistics using the SAT-based debugging engine of [3] (called stand-alone debugger).
Columns one, two, and three show the name of the debugging problem based on the design,
and its size in terms of gates and state elements (DFFs), respectively. Column four presents the
length of the erroneous trace in terms of clock cycles required to observe the erroneous behaviour
from an initial state. When the trace is too long for the debugger, the trace is reduced to only
contain the last 25 or 40 transitions in order to make automated debugging feasible. The
number of clock cycle traces used to formulate the debugging problem are presented in the
parentheses in column four. For example, the problem wb con2 contains 1397 clock cycles, but
only the last 40 clock cycles are used. The column # literals presents the total number of
literals generated in the CNF of the debugging problem [3]. Finally, columns times and mem
show the total run-time, in seconds, required to solve the problem and the required memory,
in MB, respectively. Notice that problem comm2 requires more than 8000MB to formulate the
Chapter 3. Debugging using Abstraction and Refinement 51
problem and thus runs out memory.
Table 3.6 presents the result of the proposed technique on the debugging problems. Column
one shows the name of the problems, while column two shows the maximum error cardinality
(maxN) required to solve the debugging problem. As discussed, the cardinality required to
locate the bug using an abstracted design can be larger than required to solve the original
problem. Even though the problems shown here have a single functional-level (RTL) error, for
problems comm1 and comm3 a higher cardinality of 2 is used by the overall algorithm to find the
error site. It should be noted that due to the abstraction performed, when the cardinality is
increased, the number of potential solutions does not increase as sharply as in other debugging
techniques [3].
In Table 3.6, the column labelled # itr states the number of refinement and debugging
iterations required to find all the equivalent locations (number of times line 7 of Figure 3.8 is
run). The column mod refined / total presents the number of modules refined out of the total
number of modules in the concrete design. These modules are the only ones required to diagnose
the error. The smaller this number is, the more effective is the abstraction and refinement
technique. The next three columns, # literals, time (s), and peak mem (M) present the benefit
of the proposed technique in terms of the number of literals required in the problem formulation,
the total run-time in seconds and peak memory requirement by the entire algorithm.
The improvement provided by the proposed technique is shown in the last columns of
Table 3.6 where the reduction in the number of literals, the speed-up in run-time and the
reduction in memory over the debugging technique of [3] without abstraction and refinement is
shown. The effectiveness of the abstraction technique is attributed to reducing the problem size
which is directly related to the number of literals reduced shown. For example consider problem
vga1 where 5 / 14 modules are used leading to 630.48× reduction in literals which results in
a 260.89× improvement in run-time and 27.17× reduction in overall memory requirement. For
problem comm2 which resulted in memory out without the abstraction technique, only 640MB of
the available 8000MB are required. For all problems, the number of refinement and debugging
iterations performed is larger than one. Therefore, it is clear that each iteration is much easier
and faster when abstraction is used, thus it is more advantageous to run more iterations on
Chapter 3. Debugging using Abstraction and Refinement 52
0
20
40
60
80
100
120
140
160
1 2 3 4 5 0
12000
24000
36000
48000
60000
72000
84000
96000
Sol
ve ti
me
(s)
# lit
eral
s
Number of iterations
solve time# literals
(a) vga2
0
100
200
300
400
500
600
1 2 3 4 5 6 0
300000
600000
900000
1.2e+06
1.5e+06
1.8e+06
Sol
ve ti
me
(s)
# lit
eral
s
Number of iterations
solve time# literals
(b) fdct1
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8 4000
8000
12000
16000
20000
24000
28000
32000
36000
40000
Sol
ve ti
me
(s)
# lit
eral
s
Number of iterations
solve time# literals
(c) comm1
Figure 3.10: Solve time and # literals in problem vs. the # of refinement and debugging
iterations for vga2, fdct1 and comm1
easier problem than fewer iterations on harder problems.
In Table 3.6 there are two problems that experience a slow-down. It is worthwhile to
analyze the reason for this behaviour. For problem fdct1, six iterations are required to solve
the problem, at which stage all 5 modules are used. Thus in this case, the extra iterations simply
add overhead as the entire circuit is needed in order to solve the problem. The problem vga2,
also experiences a slow down, but in this case, a 2.26× reduction in memory is observed. In this
case, unlike the overall trend, the simpler and faster debugging problems cannot compensate
for the extra iterations performed.
Figure 3.10(a), 3.10(b) and 3.10(c) provide detail into the numbers of Table 3.6 for vga2,
Chapter 3. Debugging using Abstraction and Refinement 53
fdct1 and comm1, respectively. These figures illustrate the relationship between the run-time
shown in solid line and the number of literals shown in dashed line against the refinement and
debugging iterations. Notice the general trend where both run-time and number of literals
appear to increase exponentially with the increase in the number of iterations. For the ma-
jority of cases where the proposed technique is effective, abstraction allows the problem to be
solved with a fraction of its size thus leading to smaller memory requirements and run-times.
Considering problem vga2, notice that for iterations 3,4,5 the solve time is quite high thus not
providing any run-time benefit.
The proposed techniques allow for different degrees of abstraction to be applied. In general,
aggressive (high degree) abstraction leads to more debugging and refinement steps. However,
due to the simplicity of the design when abstracted aggressively, the initial debugging and re-
finement iterations are relatively much easy problems and thus quicker to solve. This behaviour
is observed in Figures 3.10(a), 3.10(b) and 3.10(c) where the initial iterations run faster than
later ones. It may be possible to find an abstraction heuristic that can balance the number
of iterations and the functions abstracted, but this is not a trivial task. However, in these
experiments, it is found and shown that abstracting all functions and modules is quite effective.
3.7 Summary
This chapter presents state and function abstraction and refinement techniques for design de-
bugging, allowing larger designs to be debugged faster and with less memory. Designs are first
abstracted resulting in smaller debugging problems. To ensure that all the equivalent error
locations are found in the original design, a refinement process is performed. Refinement is
applied in iterations thus only re-introducing the necessary components for debugging. A con-
sequence of state abstraction is that the error trace can be further reduced thus resulting in
smaller problem formulation. Function abstraction employed in a hierarchical framework allows
for a powerful debugging framework. The experiments demonstrate run-time improvements of
up to an order of magnitude for state abstraction and up to two orders of magnitude for func-
tion abstraction. Furthermore, both abstraction techniques dramatically reduce the memory
Chapter 3. Debugging using Abstraction and Refinement 54
requirements of design debuggers as in some case as little as 10% of the memory limit is re-
quired. The advantages of abstraction and refinement are clear: larger designs can be tackled by
current debuggers with the given memory resources and consistent performance improvements
can be expected.
Chapter 4
Bounded Model Debugging
4.1 Introduction
Contemporary automated debuggers model sequential problems by employing the Iterative
Logic Array (ILA) technique, also known as time frame expansion [1, 9]. When modeling
sequential behaviour with an ILA, the combinational circuitry is replicated in the computer
memory for as many cycles as the counter-example or error trace requires. The ILA repre-
sentation of a sequential circuit has the advantage of explicitly modeling the circuit such that
existing combinational techniques can be utilized. For example, the ILA is a popular technique
in test (Automated Test Pattern Generation (ATPG)) and in verification (equivalence checking,
bounded model checking, debugging). Nevertheless, replicating the combinational part of a se-
quential circuit can lead to overwhelming memory requirements which in effect can degrade the
performance of the underlying algorithms. In debugging, the length of error traces can easily
exceed thousands of clock cycles in practice. Replicating the transition function for designs
with hundreds of thousands of primitive gate elements and for thousands of clock cycles may
not provide a viable solution.
This chapter presents a novel debugging methodology, namely Bounded Model Debugging
(BMD), that it is suited for problems with long error traces. The central idea behind BMD is
inspired by Bounded Model Checking (BMC) from the verification domain [9, 21]. Provided
enough resources, BMC and BMD establish a systematic way to cope with intractable problems
55
Chapter 4. Bounded Model Debugging 56
in an iterative manner. Hard and large problems are broken into many smaller sub-problems of
incrementally larger size and complexity, which are solved in succession. When completeness
cannot be guaranteed due to the approximation nature of the methodologies, both BMC and
BMD still provide valuable information to the user based on the solved sub-problems.
The key observation in BMD is based on the notion that errors are often excited and
observed within close temporal proximity of each other. For instance, if an error is observed in
clock cycle 100, then it is more likely that the error is excited between clock cycles 51 to 100
than between clock cycles 1 to 50. In practice, test and verification engineers have used the
above observation for decades when they manually “back-trace” a design using the last events
of very long error traces.
Based on the temporal proximity of error excitation and error observation, a BMD debugging
algorithm starts by considering a subset of the error trace. Initially, the subset of the error
trace is from clock cycle k1 to kf , where k1 is a clock cycle greater than one and kf is the
clock cycle where the failure is observed (usually the last cycle of the trace). In this thesis, we
call this interval the suffix of the error trace and it is used to formulate the initial debugging
problem. Intuitively, this portion of the problem is examined first with the expectation that
the error excitation point is within the set of cycles selected in the k1 to kf bound of the trace
suffix. A debugger that operates on the suffix trace will build a much smaller ILA than that of
the original trace, thus it will tackle a much smaller and easier debugging problem.
Clearly, when debugging with a trace suffix, errors excited prior to clock cycle k1 may not
be detected by the debugger. In this case, and to ensure completeness of the methodology,
BMD adds a special type of error suspects for consideration to the debugging problem called
initial state suspects. If these suspects are found as solutions when examining a suffix, it
indicates that errors may be active in clock cycles prior to k1. Such a situation requires a
second BMD iteration where a larger trace suffix is considered. As such, the second iteration
results in a debugging problem with a larger ILA representation, k2 to kf , where k2 < k1, but
still smaller than the original problem (i.e. 1 < k2). This iterative process continues until all
the equivalent error locations are found or resource limitations are reached. Notice that given
enough computational resources, similar to BMC, BMD also degenerates to a conventional
Chapter 4. Bounded Model Debugging 57
debugging problem formulation when ki reaches the first clock cycle of the trace.
This Chapter is organized as follows. In the next Section background information is pro-
vided on ILA and BMC. Section 4.3 introduces BMD by presenting an analysis of the sequential
debugging problem, the basic problem formulation, its impact on error cardinality and differ-
ent performance improvement techniques. Section 4.4 presents the experimental results while
Section 4.5 summarizes the Chapter. In this chapter, the terms clock cycle and time frame are
used interchangeably.
4.2 Preliminaries
This section presents background material pertaining to the ILA representation of sequential
circuits and to BMC as a partial motivation for this work.
The ILA representation models the sequential behaviour of a circuit over k clock cycles
by replicating its transition function for k time frames. A transition function refers to the
combinational logic cones that generate the next state and primary output values of a sequential
circuit given a set of input state and primary input logic values. Alternatively, a transition
function is represented by a time frame where the input/output of state elements are treated
as pseudo-output/input of the remaining circuitry. An ILA is built by first replicating the
transition function into k disjoint, but ordered, time frames. For every two consecutive time
frames, the set of current/next state variables are connected to the set of next/previous state
variables. For example, the circuit in Figures 4.1 is modelled as an ILA for five clock cycles or
time frames as shown in Figures 4.2.
When dealing with designs with hundreds of thousands of gates that need to be examined
(verification, debugging, etc) for thousands of clock cycles, the ILA model can be limited by
the memory available by the system. As a consequence, many problems in verification, testing
or debugging cannot be formulated using the ILA model, let alone be solved. When using a
SAT solver to examine a problem, the resulting CNF that corresponds to the combinational
circuitry of the ILA may contain way too many variables and clauses to fit in the memory and get
examined by the solver [23, 66]. In the remaining of this Chapter, we do not distinguish between
Chapter 4. Bounded Model Debugging 58
an ILA that it is composed of clauses or composed by gates since those two representations are
essentially equivalent.
Apart from debugging, the ILA model is popular in many CAD applications such as ATPG,
BMC, and sequential equivalence checking [1, 9, 94]. Similar to debugging, in Bounded Model
Checking (BMC), the memory limitations and run-time performance of the tools are highly
dependent on the length of the ILA. Contemporary BMC formulations typically start by at-
tempting to disprove properties using a small bound k1 < kdia, where kdia is the circuit diameter.
The circuit diameter, which is the longest of all shortest paths between any two states of the
circuit, is the minimum bound required to completely prove a safety property [9]. An ILA with
length k1 can model the sequential behaviour from the initial state to any state after k1 clock
cycles. If the property is disproved within this bound, then the BMC problem is solved and a
counter-example is returned. Otherwise, if this bound is not enough to disprove a property, the
bound is incremented and k2 > k1 is used in the next iteration of BMC. This process is repeated
with the bound being incremented until a counter-example is found, the complete proof bound
kdia is reached or resource limits are exhausted.
The BMC problem has been intensively studied over the last decade and many improvements
have been made [8, 9, 29, 95]. Even though, BMC is not a complete model checking technique
unless the bound equals the circuit diameter, it is deemed an effective “bug hunting” tool in the
industry. Today, many commercial organizations actively use BMC tools in their verification
flow with bounds much smaller than the circuit diameter successfully. The BMD technique
presented here, is based on similar intuitive background with that of BMC. However, as we will
describe, it entails fundamental theoretical and implementation differences.
4.3 Bounded Model Debugging Formulation
4.3.1 Probability Analysis of Error Behaviour
Bounded model debugging is motivated by the empirical observation that functional errors are
usually excited in temporal proximity to observation points such as primary outputs. In this
section, this observation is re-affirmed with a discussion and the respective probability analysis.
Chapter 4. Bounded Model Debugging 59
DFF
DFF
A
CD
B
E
Figure 4.1: Sample pipeline circuit with single output
A
C
E
A
C
E
A
C
E
A
C
E
A
C
E
BB B
DD D D
B
D
BB
D
Figure 4.2: Five time frame ILA for circuit in Figure 4.1
In combinational circuits, because there are no memory elements, errors are excited in the
same clock cycles that failing behaviours are observed. In sequential circuits, the situation can
be much more complex. This is because the erroneous behaviour may propagate across many
consecutive clock cycles as values get latched in memory elements. The observation point,
usually a primary output, may not exhibit an erroneous behaviour until many clock cycles
after the excitation point. Furthermore, expected values are typically not available for internal
signals or memory elements. Thus, when debugging simulation traces or counter-examples,
many clock cycles must be considered prior to the observation of the failure.
Consider the sequential circuit in Figure 4.1 and its ILA representation of five cycles shown
in Figure 4.2. Also assume that the first time a functional error is observed is at the primary
outputs of the fifth simulation cycle. We now analyze how errors can be excited in different
time frames to cause the observed failure without any knowledge of the input stimulus.
Chapter 4. Bounded Model Debugging 60
gate likelihood of errors excited in different clock cycles
name clock cycle 1 clock cycle 2 clock cycle 3 clock cycle 4 clock cycle 5
A Zero Zero Low Low High
B Zero Zero Zero Low High
C Zero Zero Zero Low High
D Zero Zero Zero Zero High
E Zero Zero Zero Zero High
Table 4.1: Likelihood of errors on gates of Figure 4.1 being excited
Notice that if the error is excited in the first two cycles, gate A cannot be the error source
because there is no propagation path from A in cycle one or two to any primary output in cycle
five. If it is the case that an error on gate A is excited in cycle three, this failure is not observed
in time frames three or four since the failure is first observed in time frame five. Similarly,
the error may be excited in time frame four, but a failure is not observed in that time frame.
Finally, the error can be both excited and observed in time frame five. This type of analysis,
without knowledge of input stimulus, can provide us with a confidence of where errors may be
excited.
Table 4.1 presents the degree of confidence of an error excited on the various gates of the
circuit from Figure 4.1 for different clock cycles. Although informal, this analysis confirms a
high likelihood that the error is excited in time frame five for all gates. As earlier clock cycles
are considered, the likelihood decreases and eventually it goes to zero in clock cycles one and
two. For the purpose of debugging, the above discussion states that it is more important for
an algorithm to spend its resources in latter clock cycles rather than in earlier ones.
Theorem 3 formally presents the probability of observing the first failure in clock cycle d,
given that the error is excited in clock cycle 1. Figure 4.3 illustrates the setup for the theorem
using a symbolic clock and an ILA representation.
Chapter 4. Bounded Model Debugging 61
Theorem 3 Assuming the following:
• a single error is excited in clock cycle 1
• no other errors are excited in any other clock cycles
• propi is the probability of the error propagating from cycle i to i + 1
• obsi is the probability of observing a failure in clock cycle i given that an error has prop-
agated to cycle i
• the input vector sequences are temporally independent and stationary random sequences
Given a sequential circuit, the probability of observing the first failure in clock cycle d is
pd =d−1∏
i=1
propi ×d−1∏
i=1
(1 − obsi) × obsd.
Proof:
Let
Wi = {an error propagates from cycle i to cycle i + 1 if it has propagated to cycle i }, and
Oi = {a failure is observable in cycle i if an error has propagated to cycle i }, and
E1 = {an error is excited in clock cycle 1}.
The probability pd can be stated in terms of the events Wi, Oi, and E1:
pd = P
(d−1⋂
i=1
Wi ∩d−1⋂
i=1
Oi ∩ Od
∣∣∣ E1
)
By applying the identity P(A ∩ B
∣∣ C
)= P
(A∣∣ C
)× P
(B∣∣ A ∩ C
), we get
pd = P
(d−1⋂
i=1
Wi
∣∣∣ E1
)
× P
(d−1⋂
i=1
Oi
∣∣∣
d−1⋂
i=1
Wi ∩ E1
)
× P
(
Od
∣∣∣
d−1⋂
i=1
Oi ∩d−1⋂
i=1
Wi ∩ E1
)
Here, the events Od andd−1⋂
i=1
Oi are conditionally independent of E1 ∩d−1⋂
i=1
Wi :
P
(
Od ∩d−1⋂
i=1
Oi
∣∣∣ E1 ∩
d−1⋂
i=1
Wi
)
= P
(
Od
∣∣∣ E1 ∩
d−1⋂
i=1
Wi
)
× P
(d−1⋂
i=1
Oi
∣∣∣ E1 ∩
d−1⋂
i=1
Wi
)
,
thus, P
(
Od
∣∣∣
d−1⋂
i=1
Oi ∩d−1⋂
i=1
Wi ∩ E1
)
= P
(
Od
∣∣∣
d−1⋂
i=1
Wi ∩ E1
)
.
Chapter 4. Bounded Model Debugging 62
As a result, pd can be simplified:
pd = P
(d−1⋂
i=1
Wi
∣∣∣ E1
)
× P
(d−1⋂
i=1
Oi
∣∣∣
d−1⋂
i=1
Wi ∩ E1
)
× P
(
Od
∣∣∣
d−1⋂
i=1
Wi ∩ E1
)
.
One of the assumptions made is that input vectors in successive cycles are all (temporally)
independent. Thus, any Wi is independent of Wj for all cycles i and j :
P(Wi ∩ Wj
∣∣ E1
)= P
(Wi
∣∣ E1
)× P
(Wj
∣∣ E1
).
As a result, P
(d−1⋂
i=1
Wi
∣∣∣ E1
)
=d−1∏
i=1
P(Wi
∣∣ E1
).
Similarly, by the assumption, any Oi is independent of Oj for all cycles i and j:
P
(
Oi ∩ Oj
∣∣∣
d−1⋂
k=1
Wk ∩ E1
)
= P
(
Oi
∣∣
d−1⋂
k=1
Wk ∩ E1
)
× P
(
Oj
∣∣∣
d−1⋂
k=1
Wk ∩ E1
)
.
As a result, P
(d−1⋂
i=1
Oi
∣∣∣
d−1⋂
i=1
Wi ∩ E1
)
=d−1∏
i=1
P
(
Oi
∣∣
d−1⋂
k=1
Wk ∩ E1
)
.
Using the above, pd can be simplified to:
pd =d−1∏
i=1
P(Wi
∣∣ E1
)×
d−1∏
i=1
P
(
Oi
∣∣
d−1⋂
k=1
Wk ∩ E1
)
× P
(
Od
∣∣∣
d−1⋂
i=1
Wi ∩ E1
)
.
In the assumptions, propj and obsj are defined as
propj = P(Wj
∣∣ E1
)for some cycle j
obsj = P
(
Oj
∣∣∣
j−1⋂
i=1
Wi ∩ E1
)
for some cycle j.
Using these definitions, pd can be presented as
pd =d−1∏
i=1
propi ×d−1∏
i=1
(1 − obsi) × obsd
�
Theorem 3 formally confirms the intuition that errors are more likely to be observed tem-
porally closer to the excitation point. More specifically, pd is found to be a negative ex-
ponential function with respect to the distance d. We can simplify pd by assuming that
Chapter 4. Bounded Model Debugging 63
1 2 3 4
clock
ILA observedexcited
d − 1
d − 1. . . d − 2 d
Figure 4.3: Illustration of example where error is excited in cycle 1 and observed in cycle d
propi equals the constant prop and obsi equals the constant obs for all cycles i, resulting in
pd = propd−1 × (1 − obs)d−1 × obs. This simplified relationship is plotted in the three curves
of Figure 4.4 with values of prop = obs = {0.1, 0.5, 0.9}. For values at d = 1 we have the case
where pd = P (O1|E1) = obs. The negative exponential relationship is clear from these plots as
the three curves are no longer visible when d is greater 6.
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10
p d
d: clock cycles
prop=obs=0.5prop=obs=0.9prop=obs=0.1
Figure 4.4: Plotting pd as function of d with prop = obs = {0.1, 0.5, 0.9}
We can further analyze pd as a function of prop and obs. Figures 4.5(a), 4.5(b), 4.5(c)
and 4.5(d), present this relationship in three dimensional graphs with values of d=1,2,3 and 4,
respectively. First, we notice special case for d = 1 where pd depends only on obs. In the next
three figures, each curve reaches a maximum pd with prop = 1 and obs is between 0.2 and 0.5.
Chapter 4. Bounded Model Debugging 64
0 0.2 0.4 0.6 0.8 1 0 0.2
0.4 0.6
0.8 1
0 0.2 0.4 0.6 0.8
1
pd
prop
obs
pd
(a) d = 1
0 0.2 0.4 0.6 0.8 1 0 0.2
0.4 0.6
0.8 1
0 0.05 0.1
0.15 0.2
0.25
pd
prop
obs
pd
(b) d = 2
0 0.2 0.4 0.6 0.8 1 0 0.2
0.4 0.6
0.8 1
0 0.02 0.04 0.06 0.08 0.1
0.12 0.14 0.16
pd
prop
obs
pd
(c) d = 3
0 0.2 0.4 0.6 0.8 1 0 0.2
0.4 0.6
0.8 1
0 0.02 0.04 0.06 0.08 0.1
0.12 0.14 0.16
pd
prop
obs
pd
(d) d = 4
Figure 4.5: Plotting pd as function of prop and obs with different d = {1, 2, 3, 4}
Intuitively, the higher the probability of prop, the more likely the error can propagate across
clock cycles to be observed. Thus, for any given d, the erroneous behaviour is more likely to
reach clock cycle d. The effect of obs is not as straight forward. High probabilities of obs,
mean that the error can be observed in early clock cycles and may not reach clock cycle d, thus
leading to small pd. However, when obs is very small, it reduces the likelihood of observing the
error even in clock cycle d, also reducing pd. In general, as d increases the value of obs that
maximizes pd decreases.
The above analysis allows us to better understand the general relationship between pd and
d. The analysis with respect to prop and obs also provides insight in how pd can vary. However,
we should emphasize that many assumptions made here, including the constant probabilities of
prop and obs are not realistic in industrial settings. In other words, prop and obs may not be
independent of each other and pd will also be dependent on the stimulus vectors of the circuit,
Chapter 4. Bounded Model Debugging 65
the circuit structure and checkers, and other factors not included in analysis. Furthermore,
errors can be excited multiple times in an error trace leading to more complex scenarios. As
a result, the analysis should be used primarily to confirm the steep negative slope of pd with
respect to d. This general relationship is validated by the experimental results of Section 4.4.
4.3.2 Problem Formulation
Since sequential debugging problem formulations are modelled using the ILA representation,
the problem size depends linearly on the number of clock cycles in the error trace. As with most
SAT-based CAD techniques, the performance of SAT-based automated debuggers degrade at
a rate faster than linear with the problem size [26]. This is due to the fact that debugging is
an NP-Complete problem which may require exponential run-time in the worst case [91, 100].
Larger CNF problems not only demand more memory, but they may also have a dramatic
impact on the overall run-time performance of the solver.
Given the analysis presented in Section 4.3.1 and the considerable impact of the trace length
on the overall performance, a practical BMD methodology is devised. BMD is a complete
and systematic debugging methodology that focuses on finding error suspects by considering
suffixes of the error trace. Shorter traces provide the benefits of generating a smaller problem
instance that can fit into the given memory and also has the potential to improve the run-time
performance.
Given an error trace of kf clock cycles, the BMD methodology starts by considering a short
suffix of the error trace from clock cycle k1 to kf . Note that kf is the first clock cycle where a
failure is observed. In the remainder of this chapter, we use notation vBMD to refer to diagnosis
vector obtained from the suffix of the error trace. As stated in Definition 2 of Section 2.2, a
diagnosis vector contains initial state values as well as stimulus and expected response sequences.
The suffix of the error trace directly provides the input stimulus and response sequences. The
initial state values for vBMD are captured in the state elements of the design when simulated
using the input stimulus sequence from clock cycle 1 to k1−1. Using the diagnosis vector vBMD
and the erroneous design C, an automated debugging problem can be formulated as presented
in Section 2.5.
Chapter 4. Bounded Model Debugging 66
A
C
A
C
A
C 1/0
1/0
11 1
1/0
D D
BB
Figure 4.6: ILA for length three with error excited on gate A
As an example consider a circuit represented as a three time frame ILA in Figure 4.6. Here
the correct circuit is shown with the correct and erroneous values annotated onto the ILA,
separated by a “slash” respectively. Signal values that are not relevant to the example are
omitted for clarity. Signals that do not have an erroneous value are only labelled with a single
Boolean value. This convention is used for the remaining examples in this chapter.
When debugging the three cycle ILA, a debugger will return the equivalent error gates A
and C and memory elements B and D. When only the last two clock cycles k1 = 2 and kf = 3
are considered, the ILA shown in Figure 4.7 is used to formulate the debugging problem. The
initial state of B= 0 is obtained by simulating the erroneous circuit for one clock cycle. When
the problem is provided to an automated debugger, the suspects corresponding to gates B, C
and D are returned. Since the error on gate A is excited in the omitted first time frame, this
potential error source is missing from the set of solutions.
As illustrated by the above example, considering only a suffix of an error trace can result in a
set of incomplete suspects. To avoid this behaviour, a mechanism is required to determine when
solutions are not complete and a longer ILA must be considered for debugging. Returning to
Figure 4.7, notice that the erroneous behaviour of gate A is captured in the initial state variable
B (i.e. 1/0). Even though gate A is not available in the two-cycle ILA, the fact that the initial
state B is found as a suspect means that its transitive fanin logic may also be erroneous. In
other words, if the method finds some initial states as suspects, this can be seen as a signal
that the returned solution set may be incomplete.
Chapter 4. Bounded Model Debugging 67
A
C
A
C 1/0
1
1/0
D D
B
B1/0
Figure 4.7: Suffix of size two of ILA shown in Figure 4.6
Theorem 4 Assume that errors on gates G1, G2, ..., GN are excited between clock cycles 1 to
k1 − 1 of a kf cycle long trace. If the first failure observed from these errors is in clock cycle
kf , then the erroneous behaviour of gates G1, G2, ..., GN in clock cycles 1 to k1 − 1 is observed
in state elements of clock cycle k1.
Proof: Since a failure is first observed in clock cycle kf , the error sites excited in clock cycles
1 to k1 − 1 must propagate to observation points in clock cycle kf . Since state elements are
the only components that can propagate signal values across time frames, the erroneous values
must propagate through state elements in clock cycle k1. �
The above theorem can be extended to equivalent error locations, since these locations
cannot be distinguished from one another under a given vector sequence set (see Section 2.5).
According to Theorem 4, initial state elements can capture the erroneous behaviour of their
erroneous transitive fanins. Thus, initial state elements can identify whether solutions are
incomplete as they exhibit an erroneous behaviour. More specifically, an automated debugger
can be used to find initial state suspects in addition to error suspects. Initial state suspects
represent possible corrections in the state element functions in the first time frame of the suffix
(i.e. at clock cycle k1). If an initial state suspect is found, an erroneous gate may be excited
in a time frame prior to k1 and its behaviour may have propagated to k1 thus leading to the
failure in kf . As a result, a longer suffix with more clock cycles must be considered so that the
engine can localize all errors.
Chapter 4. Bounded Model Debugging 68
Recall that in SAT-based debugging, the circuit is enhanced with correction models to
consider specific locations as sources of errors. In Section 2.5, multiplexers are added to the
output of gates to implement correction models at the gate-level. For sequential debugging
problems, since a correction model must be consistent across time frames, the select lines of
corresponding multiplexers in different time frames are tied together. In other words, the
correction models in different time frames are grouped together. This allows an automated
debugger to treat all the different correction models for a single gate across all time frames as
a single suspect during analysis.
Returning to the previous example, for automated debugging, the erroneous circuit must be
enhanced with correction models (see Section 2.5) to represent initial state suspect as sources of
errors. In this case, these correction models are only needed in the first time frame. Note, that
they are independent of one another and should not be grouped together. Figure 4.8 illustrates
the debugging problem of Figure 4.7 where correction models are represented with circles of
different patterns. Circles with the same pattern represent grouped correction models.
As an illustration, gate A in time frames one and two have black dots because they represent a
correction in both time frames for gate A. Notice that there are effectively two correction models
on variables corresponding to signals B and D in the first time frame. For gate B, a “\” pattern
represents the gate error suspect while a “#” pattern represents the initial state suspect. In
this case, a debugger will find both suspects of B as solutions. Since one of the solutions is an
initial state suspect, the results may not be complete and a longer suffix must be considered.
In the next iteration, a suffix comprising the entire trace is used and the missed solution (gate
A) is found.
In summary, the BMD methodology starts by formulating the debugging problem with a
smaller suffix than the original trace. If an automated debugger finds any initial state suspects
as part of the solution, the suffix length is increased to k2 where k2 < k1. This longer suffix,
results in a new ILA to formulate the debugging problem in the next iteration of BMD. This
process of debugging followed by an increased suffix length if initial state suspects are found
is repeated until no initial state suspects are found or the resource limits are reached. In the
worst case, this process degenerates to a conventional debugging technique when the complete
Chapter 4. Bounded Model Debugging 69
����
������
������
������
������
����
����
����
����
����
A
C
A
C 1/0
1
1/0
D
B
1/0B
D
Figure 4.8: ILA of Figure 4.7 annotated with error suspects and initial state suspects
BA
C
DFF
Figure 4.9: Simple circuit with single error source on gate A
trace is examined. As shown through experiments, this scenario seldom occurs.
4.3.3 Impact on Error Cardinality
The BMD methodology introduced in the previous section can impact the error cardinality,
N , used by automated debuggers, (see Definition 6 of Section 2.3.1). Consider the example
in Figure 4.9 where the error on gate A propagates to state element C and gate B. Figure 4.10
illustrates the resulting ILA for two time frames. In this case, when employing BMD with an
initial suffix of length one and looking for N = 1 errors, only the suspect gate B is found. The
erroneous gate A and the state element C are not returned as solutions since neither one can
fix the failure on its own. As a result, because state element suspect C is not contained in the
solution set, the BMD length will not be increased and the method will terminate erroneously.
The missed solution (i.e. set of suspects) in the above example is due to the fact that
the error from gate A propagates to two separate gates (state elements C and gate B) whose
combined effect result in the observed error. Thus the debugging problem requires a cardinality
N = 2 with suffix length of one. For example, if N = 2 with k1 = 2, then solution {B, C} is
returned. Since C is also an initial state suspect, the suffix length will be increased and the
Chapter 4. Bounded Model Debugging 70
A B BA
0/0
C
1/0
1/1 1/0
1/0
1/0
1 1
Figure 4.10: Example of single error source excited in two clock cycles
algorithm will iterate successfully.
For certain problems, the error cardinality used by the debugger must be increased in order
to identify initial state suspects as solutions and increase the suffix length. Given that the
maximum user defined error cardinality is maxN , we need to find the maximum error cardinality
to use with BMD. The following theorem presents an upper bound for error cardinality that
will find all initial state suspects, thus guaranteeing completeness of BMD. This upper bound
is later refined in the improvements of Section 4.3.4.
Theorem 5 Given an erroneous circuit with maxN errors, a diagnosis vector vBMD with a
suffix trace from clock cycle k1 to kf , where k1 > 1, the BMD debugging methodology will find
all initial state suspects as solutions if the maximum error cardinality used is
maxNBMD = NDFF + maxN
where NDFF is the total number of state elements in the circuit.
Proof: When debugging with BMD using the diagnosis vector vBMD, the problem formulation
will contain correction models on each erroneous gate G1, G2, ..., GmaxN in time frames k1 to kf
plus all the state elements of time frame k1. The correction models for circuit gates are grouped
together across time frames. In the worst case, maxN errors are excited both before and after
clock cycle k1, thus erroneous behaviour can be latched in the state elements in clock cycle
k1. In this case, a debugger requires all the correction models to be active (i.e. select line of
multiplexers assigned to 1) to find a solution set corresponding to this erroneous behaviour. The
maximum number of active lines is one for each erroneous gate and one for each state element
Chapter 4. Bounded Model Debugging 71
C E
B
A
D
DFF
F
DFF
1/0
1 1
1/01/0 1/0 1/0 1/0
Figure 4.11: Example of pipelined circuit
C
B
A
E EED
F
C
B
A
C
B
A
D
F
D
F
1
1/0
1
1/0
1/0
1/0 1/0
1/0
1/0
Figure 4.12: Three time frame ILA of circuit in Figure 4.11
or maxNBMD = NDFF + maxN . With maxNBMD all initial state suspects that contain an
erroneous behaviour will be found in the solution set. �
Theorem 5 presents the maximum cardinality required by an automated debugger using
BMD to guarantee completeness of the solutions. Under the suffixes of different BMD itera-
tions, the error cardinality can increase as shown above, or decrease. The following example,
illustrates a case, where the maximum error cardinality is reduced under a given suffix. Con-
sider Figure 4.11 where two error gates A and B combine to result in the observed failure in time
frame three. The original ILA of size three is shown in Figure 4.12. With a BMD suffix of size
two, the suspects D, E and F are found with N = 1. Since D is also an initial state suspect the
suffix length must be increased. In summary, the initial BMD iteration is solved with N = 1
while an error cardinality N = 2 is required to solve the problem with the complete trace.
Every time an initial state suspect is found as a solution, the suffix length is increased
and a subsequent debugging problem is formulated. The error cardinality for the subsequent
problems must be reset to N = 1 regardless of its value in previous iterations. Reseting N
Chapter 4. Bounded Model Debugging 72
DFF
1/0
1/0DFF
DFF
1/0 1/0
1/0
BA
E
C
D
Figure 4.13: Example with error source A propagating through three DFFs
ensures that the smallest cardinality solutions are found in every iteration. Re-visiting the
example in Figure 4.9, when the ILA length is increased from one to two clock cycles, the
cardinality must be reset from N = 2 to N = 1 in order to find the single error site A.
4.3.4 Improvements to Basic Methodology
The previous sections introduced the basic BMD methodology and presented a flow to guarantee
solution completeness. This section presents several performance enhancing techniques.
4.3.4.1 Reducing the Number of Initial Error Suspects
One improvement relates to the set of initial state suspects. As stated by Theorem 5, the
maximum error cardinality for a BMD problem can grow according to the number of state
elements in the circuit. For example in Figure 4.13, a single erroneous gate propagates to
three state elements before propagating to a primary output. In order to identify the error
source, the cardinality must be incremented to N = 4 in order to consider a longer suffix. Since
the complexity of the debugging problem grows exponentially with the error cardinality, as
explained in Section 2.3.1, it becomes important to develop techniques to aid BMD so that it
does not require a large error cardinality while remaining complete.
One way to avoid a large increase in the error cardinality, is to group all correction models
for initial suspects together as a single suspect. Recall from Section 4.3.2 that grouping correc-
tion models can be as simple as connecting the select lines of multiplexers together. Since any
Chapter 4. Bounded Model Debugging 73
solution set with initial state suspects requires increasing the length of the suffix for future iter-
ations of BMD, there is no need to distinguish which initial state suspects are found. Returning
to the example of Figure 4.13, when grouping all initial suspects together, the debugger does
not distinguish between solutions containing initial state suspects for DFF B, C, or D as any one
requires increasing the suffix length. As a result, the suffix can be increased when using N = 2
instead of N = 4.
Theorem 6 Given an erroneous circuit with maxN errors, a diagnosis vector vBMD with a
suffix trace from clock cycle k1 to kf , where k1 > 1, the BMD debugging methodology will find
all initial state suspects as solutions if the maximum error cardinality used is maxNBMD =
maxN + 1 and all initial state suspects are grouped as one suspect.
Proof: Since all logic value propagations across consecutive clock cycles occur through state
elements, by grouping all initial state elements together, all propagation must occur through this
single group. As a result, all erroneous behaviour from clock cycles prior to k1 must propagate
through the initial suspect group to reach the failure in clock cycle kf . When modeled by a
single suspect, the initial suspect group can be selected in combination with other suspects by
increasing the cardinality maxN by one. �
4.3.4.2 Reusing Solutions
Another improvement relates to the iterative nature of the BMD methodology. At every itera-
tion, the debugging problems with a longer suffixes may contain solutions that are already found
through previous iterations with smaller suffixes. For instance, in the example of Figure 4.6,
solutions B, C, and D with k1 = 2 are also found at k2 = 1. This leads us to the conclusion that
solutions found with smaller suffixes can be excluded from the search space of future iterations
when a larger suffix is used for a given error cardinality N .
Theorem 7 Consider two debugging problems formulated using the erroneous circuit C, the
maximum error cardinality N , and two different trace suffixes. One suffix is from clock cycle
ki to kf , while the other is from clock cycle kj to kf , and ki ≥ kj. Every solution s ∈ Si from
Chapter 4. Bounded Model Debugging 74
1: exit condition = 02: Final Solutions = ∅3: k = kf − incr
4: while (!exit condition) do
5: initial states = get current states(C,k − 1)6: v = {initial states, stimulusk→kf
, responsek→kf}
7: S = Suspect locations ∪ initial state suspect
8: Solutions = debug(C,v, N, S)9: for all Solution ∈ Solutions do
10: valid solution = 111: for all Suspects ∈ Solution do
12: if (is initial state(Suspect)) then
13: k = k − incr
14: N = 015: valid solution = 016: end if
17: end for
18: if (valid solution == 1) then
19: Final Solutions = Final Solutions ∪Solution
20: end if
21: end for
22: if (N == maxN) then
23: exit condition = 124: else
25: N = N + 126: end if
27: end while
28: return Final Solutions
Figure 4.14: Complete BMD algorithm
the first debugging problem is also a solution to the second debugging problem, s ∈ Sj, if s does
not contain any initial state suspects.
Proof: Since the solution s with cardinality N does not contain an initial state suspect, then
the error is active within the clock cycles ki to kf . Since the interval kj to kf contains interval
ki to kf , solution s will also be a solution in the larger interval provided the cardinality N is
the same. �
The observation in Theorem 7, allows the BMD framework to skip solutions found in pre-
vious iterations to achieve performance improvements. Practically, this is performed by adding
a blocking clause to the CNF to prevent finding solutions that have been already found in
subsequent iterations.
Chapter 4. Bounded Model Debugging 75
4.3.5 Overall Algorithm
The BMD methodology described in this chapter including the improvements of Section 4.3.4
is presented in the algorithm of Figure 4.14.
Initially, BMD uses the suffix from clock cycle kf − incr to clock cycle kf as shown on line
3. The while loop shown from line 4 to line 27 comprises the BMD iterations where successive
debugging problems are constructed with longer suffixes. On lines 5 the initial state constraints
are captured by simulating the C for k − 1 cycles while on line 6, the stimulus, response and
initial state values are combined to construct the diagnosis vector v. Grouping the initial state
suspects as presented in Section 4.3.4 and adding all the potential suspects to S is performed
on line 7. On line 8, an automated debugger is called to solve the constructed problem with
error cardinality N .
Once solutions are found by the debugger, determining to extend the length of the suffix
is decided on line 12 based on whether the grouped initial state suspect is found. Lines 13–14
increase the length of the suffix and reset the error cardinality. When a solution does not
contain the initial state suspect, the solutions are added to the final set as shown on line 19.
Finally, the BMD process terminates when the maximum user defined cardinality maxN is
reached in line 22. Not shown here, are terminations conditions based on resource limits such
as time-out and memory-out.
4.4 Experiments
In this section, we present experimental results of the BMD methodology presented in this
chapter. All experiments are conducted on a single core of a Core 2 Quad 2.66GHz machine
with 8GB of memory. The debugger used is a hierarchical sequential debugger developed in
C++ based on the concepts of [3] with a Verilog frontend to allow for RTL-based debugging.
The SAT solver used by the debugger is MiniSAT [74]. In the following this tool is referred to
as the stand-alone debugger.
The circuits selected for experiments are Verilog RTL designs from OpenCores [77] as well as
three industrial designs (fxu, rx comm, s comm) provided to the research group by semiconduc-
Chapter 4. Bounded Model Debugging 76
Problem # gates # DFFs # cycles (kf ) run-time (s) # solutions error found
ac97 ctrl-1 25310 2346 978 2613.62 49 yesac97 ctrl-2 25288 2345 670 1245.19 34 yesdiv64bits-1 74846 5512 108 713.01 21 yesfdct-1 377801 5717 182 MO N/A nofdct-2 377801 5717 186 MO N/A nofpu-1 82371 1083 316 2108.97 6 yesfpu-2 22953 515 640 TO 10 nofxu-1 602673 29080 28 1958.15 32 yesfxu-2 267423 12016 154 TO 3 nomem ctrl-1 46168 1145 681 2190.29 5 yesmem ctrl-2 46168 1145 757 TO 5 norx comm-1 585641 30339 675 MO N/A norx comm-2 585641 30339 253 MO N/A norx comm-3 585632 30339 573 MO N/A norx comm-4 220456 18333 180 2240.73 85 yesrx comm-5 585265 30339 99 TO 54 norx comm-6 585641 30339 560 MO N/A nos comm-1 779607 29967 212 MO N/A nos comm-2 779607 29967 212 MO N/A nos comm-3 779575 29967 212 MO N/A nos comm-4 779607 29967 132 MO N/A nos comm-5 790407 29967 132 MO N/A nospi-1 2942 185 251 973.18 65 yesspi-2 2954 185 648 MO N/A novga-1 153837 17102 863 MO N/A novga-2 153837 17102 902 MO N/A novga-3 155370 17206 175 1626.64 63 yesvga-4 154137 17138 209 1531.70 33 yesvga-5 154609 17146 381 MO N/A novga-6 153837 17102 849 MO N/A nowb-1 4479 251 269 466.03 14 yeswb conmax-1 85049 818 651 MO N/A no
Table 4.2: Circuit and performance statistics without BMD
tor firms. In each of these designs one or more errors are added at the RTL level. For example
these errors may be wrong state transitions, incorrect RTL operations, or even wrong module
instantiations. It is important to emphasize that these errors at the RTL often translate into
dozens of error locations at the gate-level. Every instance of the designs with an inserted error
is a debugging problem used in the experiments. Each debugging problem has a corresponding
diagnosis trace which includes stimulus vectors and expected response vectors provided by the
testbench.
Table 4.2 provides a summary of the debugging problems as well as the performance of
Chapter 4. Bounded Model Debugging 77
circuit run-time (s) BMD iters # solns found found in iter improv (×)
ac97 ctrl-1 204.57 10 7 0 6.09ac97 ctrl-2 747.24 10 13 1 1.67div64bits-1 1264.49 10 20 2 0.56fdct-1 TO 5 38 0 N/Afdct-2 TO 4 48 2 N/Afpu-1 201.01 4 6 1 10.49fpu-2 333.00 10 24 1 10.81fxu-1 479.14 1 24 1 7.51fxu-2 174.36 1 28 1 4.09mem ctrl-1 22.43 1 5 1 97.65mem ctrl-2 28.35 1 11 1 126.98rx comm-1 452.97 1 30 1 7.95rx comm-2 331.19 1 18 1 10.87rx comm-3 369.09 1 5 1 9.75rx comm-4 TO 3 81 7 0.62rx comm-5 275.79 1 15 1 13.05rx comm-6 393.01 1 17 1 9.16s comm-1 TO 4 21 1 N/As comm-2 TO 4 20 3 N/As comm-3 TO 4 14 1 N/As comm-4 TO 3 71 1 N/As comm-5 TO 3 39 2 N/Aspi-1 151.07 10 63 1 3.53spi-2 106.47 10 57 1 33.81vga-1 553.35 3 63 1 6.51vga-2 1336.67 3 33 1 2.69vga-3 685.95 3 83 1 2.37vga-4 163.03 1 6 1 9.40vga-5 2982.43 5 29 3 1.21vga-6 166.52 1 8 1 21.62wb-1 553.35 3 63 1 0.84wb conmax-1 41.56 1 12 1 86.62
Table 4.3: Performance with BMD on increment size of 10 clock cycles
the stand-alone debugger on each instance. Column one, two and three label the debugging
problem, and show its gate and DFF count, respectively. Column four shows the number of clock
cycles in the entire stimulus trace provided by the testbench. This number also corresponds to
the first clock cycle kf where a failure is observed. The problems used are specifically chosen
because of their large circuit size (over 100K gates), long error trace (hundreds of clock cycles)
or both. This combination results in hard problems that push the capabilities of state-of-the
art debuggers.
The next three columns of Table 4.2 present debugging statistics when using the stand-alone
Chapter 4. Bounded Model Debugging 78
debugger. Column five shows the run-time in seconds required to solve each problem. Column
six enumerates the number of solutions found, or the total number of equivalent error locations
found with maxN = 1. Column seven states whether the actual inserted RTL error is found
as one of the solutions. In cases where more than one hour of CPU is used, a time-out (TO)
is declared and where more than 8GB of memory is required, a memory-out (MO) is declared.
In summary, of the 32 debugging problems, three time-out, 17 memory-out, and the inserted
error is found in only 11 or 34% of all cases.
The BMD methodology introduced in this chapter is implemented according to the algorithm
of Figure 4.14. Here an initial suffix length of 10 clock cycles is used as well as an increment
of 10 clock cycles each time the suffix is increased. A maximum limit of 100 clock cycle is
set as a hard limit, where the BMD methodology terminates. The performance of BMD is
presented Table 4.3 with the problem instance shown in Column one. Column two presents the
run-time in seconds required by BMD to solve each problem. Column three shows the number
of BMD iterations performed until the process terminates, or the number of debugger problems
solved with different suffixes. The corresponding total number of solutions found by all BMD
iterations are shown in column four. When the inserted error is found, the iteration in which
the error is found is listed in column five. If the inserted solution is not found, a zero (0) is
listed in the column. The final column presents the performance improvement achieved by the
BMD methodology over the stand-alone debugger.
The benefit of the BMD methodology is apparent based on multiple criteria. First notice
that none of the problems solved with BMD exceed the 8GB memory limit while 17 instance
resulted in a memory-out with the stand-alone debugger. Instead, with BMD, eight problems
run over the one hour time limit. It is clear that BMD provides a trade-off between the time and
memory resources. This trade-off is seen favourably because the overall number of problems
where the inserted error is found increases from 11 to 30 when using BMD. In practice, the
complete problem need not be solved in order to find the error source or to provide vital
debugging information to the user.
When using BMD, as shown in column five, for only two problems the inserted RTL error
is not found. These are ac97 ctrl-1 where the maximum suffix length of 100 clock cycles is
Chapter 4. Bounded Model Debugging 79
0
500000
1e+06
1.5e+06
2e+06
2.5e+06
3e+06
0 100 200 300 400 500 600 700 800
Mem
ory
usag
e (K
B)
CPU run time (s)
(a) ac97 ctrl-2
0
200000
400000
600000
800000
1e+06
1.2e+06
1.4e+06
1.6e+06
0 50 100 150 200 250 300 350 400 450 500
Mem
ory
usag
e (K
B)
CPU run time (s)
(b) fpu-2
0
1e+06
2e+06
3e+06
4e+06
5e+06
6e+06
7e+06
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Mem
ory
usag
e (K
B)
CPU run time (s)
(c) rx comm-4
0
1e+06
2e+06
3e+06
4e+06
5e+06
6e+06
7e+06
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Mem
ory
usag
e (K
B)
CPU run time (s)
(d) vga-5
Figure 4.15: Memory usage versus CPU run-time for four selected problems
reached and fdct-1 where the time-out limit of one hour is reached. Similarly, in column four
of Table 4.3 at least some solutions are found for all 32 problems. In contrast, in Table 4.2,
for 17 of 32 cases no solutions are found at all due to exceeding memory resources. Again, this
data favours the memory versus time trade-off provided by BMD.
The data in Table 4.3 also reaffirm the probabilistic analysis performed in Section 4.3.1
that errors are excited in temporal proximity to the failure point. In column three, 12 of 32
problems only require one BMD iteration or a suffix of length 10 clock cycles in order to debug
the problem completely. On average less 15% of the original trace length is used. Without
considering cases that time-out, only 6 of 24 problems or 25% of cases require more than 100
clock cycles to provide complete solutions.
Finally, notice the run-time improvement of the BMD methodology over the stand-alone
debugger shown in column six of Table 4.3. Here improvements are achieved from 1.21 × to
Chapter 4. Bounded Model Debugging 80
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10
ac97_ctrl-2fpu-2
rx_comm-4vga-5spi-1
Figure 4.16: Debugger solutions found versus BMD iterations
126.98 ×, or two orders of magnitude. In three only cases, div64bits-1, rx-comm4 and wb-1
a performance hit is noticed because the multiple iterations of BMD result in longer run-time
than running the stand-alone BMD. However, it is clear that BMD is very effective for the vast
majority of problems.
Figures 4.15(a), 4.15(b), 4.15(c) and 4.15(d), plot the memory requirement as a function of
CPU time for problems ac97 ctrl-2, fpu-2, rx comm-4, and vga-5. The memory requirement
graph follows a rising step pattern each time the suffix length is increased. For example, in
Figure 4.15(d), there are five distinct plateaus corresponding to the debugger solving problems
with suffixes of length 10, 20, 30, 40 and 50. As the suffix length increases, the incremental
memory required appears constant. However, notice that the solve time increases at a faster
rate than the suffix length. For example, the first iteration, which requires approximately 1.5
GB, takes under 100 seconds to solve, while the last iteration, which requires approximately 6.5
GB, takes approximately 600 seconds to solve. These graphs confirm the fact that with larger
suffixes, debugging problems also become considerably harder to solve.
The final analysis of the BMD methodology is with respect to the number of solutions found
as a function of BMD iterations. As shown in Figure 4.16, for the sample problems selected, the
number of solutions found by BMD increases initially and plateaus in later iterations. Notice
that the number of solutions does not always increase, since some solutions which may contain
initial state suspects in prior iterations may be removed as solutions in future iterations. This
Chapter 4. Bounded Model Debugging 81
graph portrays the BMD methodology favourably as it indicates that increasing the suffix length
after a certain point does not result in any more new solutions. As a result, the BMD approach
of starting with a small suffix and systematically increasing the suffix length appears to be
effective for debugging.
4.5 Summary
This chapter introduces the bounded model debugging methodology to efficiently and system-
atically tackle problems with long error traces. The contribution is based on the empirical
observation that errors are excited and failures are observed in temporal proximity. This ob-
servation is reaffirmed through probability analysis as well as through empirical evidence. The
BMD methodology proposed, inspired by bounded model checking, is found to be faster than a
state-of-the-art debugger in 90% of cases. Furthermore it is more robust, as the error is found
in over 93% of problems compared to 34% without BMD. Overall, BMD allows large problems
with very long traces to be handled in an efficiency manner by existing debuggers.
Chapter 5
Debugging using Max-SAT
5.1 Introduction
The contributions presented in Chapters 3 and 4 build on the SAT-based debugging formulation
of Smith et al. as presented in Section 2.5. In this formulation, the SAT problem is constructed
using an erroneous circuit and corresponding diagnosis vector. Since the resulting union of
the CNF for these two elements is inherently unsatisfiable, mechanisms are added to the CNF
to formulate a satisfiable problem whose satisfying assignments correspond to error suspects.
These mechanism are the correction models and the error cardinality constraints described in
Section 2.5. In part the debugging problem is formulated as a satisfiability problem to take
advantage of the great improvements achieved in SAT solvers over the past decade.
There are other types of analysis tools and solvers that fit well the unsatisfiable nature
of the debugging problem. For instance, unsat cores, which are derived from the proof of
unsatisfiability of a SAT instance can provide insight about the conflicting behaviour of circuit
components [68, 96]. Maximum satisfiability (max-sat) solvers are engines that reason about an
unsatisfiable CNF to find the largest subset of the CNF clauses whose union is satisfiable. The
inverse of the max-sat solution, the CNF clauses not included in the subset, can help identify
error sites. Thus unsat core analysis and max-sat solvers can help solve debugging problems
with little or no additional mechanisms.
One of the contributions presented in this chapter is the first automated debugging frame-
82
Chapter 5. Debugging using Max-SAT 83
work using maximum satisfiability. The formulation is constructed from the union of the con-
straints corresponding to the erroneous design and the diagnosis vector. Since the incorrect
design cannot produce the correct response under the given stimulus, the CNF can only be
satisfied if some of the CNF clauses are removed. A max-sat solver identifies these constraints
by finding the largest CNF clause subset that is satisfiable. The remaining constraints in turn
correspond to circuit components whose rectification will remove the observed functional failure.
In order to guarantee completeness, max-sat solvers can be called iteratively until all maximum
satisfiable subsets of a given cardinality are found.
The proposed technique is an alternative to gate-level SAT-based debugging which can be
easily enhanced to over-approximate solutions. Over-approximation using max-sat is a second
major contribution of this work as it allows the debugging problem to be tackled in a divide
and conquer approach by trading-off the tool’s performance with solution resolution. More
specifically, approximation can reduce the problem complexity and thus require less run-time
at the cost of finding larger, less precise solutions. Although not exact, this approach is proposed
as a pre-processing step that filters solutions for a second stage exact debugger. The second
stage conventional debugger will benefit from having fewer potential suspects which translates
to faster run-times. The combined two-step debugging reduces the complexity of both stages,
resulting in a more computationally efficient overall solution.
A suite of experiments on combinational and sequential circuits for single and multiple
vectors are conducted to demonstrate the benefit of the proposed framework. On average,
the over-approximation technique quickly eliminates 92% of the suspects. The second stage
debugger uses the filtered suspects to find the exact error sources in a fraction of the time
it would take otherwise. Overall, performance improvements of 200 times or two orders of
magnitude over a state-of-the-art debugger are observed consistently.
In the next section, background is provided on max-sat solving. Section 5.3 presents the
proposed max-sat approach for combinational circuits and Section 5.4 extends this for sequential
circuits and for multiple vectors. Section 5.5 present the over-approximation technique and the
overall framework developed for optimal performance, respectively. Experiments are presented
in Section 5.6 followed by the the Chapter summary in Section 5.7.
Chapter 5. Debugging using Max-SAT 84
5.2 Background
5.2.1 Maximum Satisfiability
While max-sat is concerned with finding a satisfiable set of clauses with maximum cardinality,
this can be generalized to find Maximal Satisfiable Subsets (MSSes). An MSS is a satisfiable
subset of a formula’s clauses that is maximal in the sense that adding any one of the remaining
clauses would make it UNSAT. Any max-sat solution is of course an MSS, but MSSes can
be different (smaller) sizes as well. In this work, the complements of MSSes, sets of clauses
whose removal makes the instance satisfiable, are of interest. Just as an MSS is maximal, its
complement is minimal, and we refer to such a set as a Minimal Correction Set (MCS). This
work makes use of two following techniques developed as extensions to the algorithm from [60]:
• Finding all MCSes up to size k
• Grouping clauses to produce ”approximate” MCSes
Finding all MCSes up to size k is performed by the algorithm AllMCSes from [60], which
was developed as the first phase of an approach for finding all Minimal Unsatisfiable Subsets
(MUSes). This procedure solves consecutive optimization problems, finding MCSes in order of
increasing size (equivalent to finding their complementary MSSes in order of decreasing size).
MCSes are returned as they are found, and execution can be stopped when a size limit is
reached.
The second ability, of grouping clauses, depends on the way the algorithm uses clause-
selector variables. Every clause Ci is augmented with a new variable yi, producing C ′
i =
(yi + Ci) = (yi → Ci). When yi is assigned TRUE, the original clause Ci must be satisfied,
while when yi is FALSE, C ′
i is satisfied, essentially disabling the original clause. This gives a
standard SAT solver the ability to enable and disable constraints implicitly within the normal
backtracking search. By assigning the same y variable to multiple clauses, a set of clauses can
be treated as a single higher-level constraint (the conjunction of all clauses given the same
y variable) that can be enabled and disabled at once. Using this approach, each MCS is a
minimal set of groups of constraints whose removal makes the instance satisfiable. This leads to
Chapter 5. Debugging using Max-SAT 85
an over-approximation of an MCS of the original clauses, because extra clauses will be included
in groups even though they may not be necessary. The benefit of the over-approximation is
that it can greatly increase the performance of the algorithm as the search space is reduced
exponentially.
This work uses the MCS techniques outlined above for debugging. Although not precise in
the general case, the term max-sat is used throughout to refer collectively to the techniques
above for simplicity of the presentation.
5.3 Debugging Combinational Circuits with Max-sat
Given an erroneous circuit C, an input stimulus I, and the corresponding correct output re-
sponse O a CNF formula can be produced as follows.
Φ = I · O · CNF(C)
This CNF problem is naturally unsatisfiable because the erroneous circuit cannot produce the
correct output response under the given input vector. Since the inconsistency between a circuit’s
actual and correct response is due to some gate-level error sources, the unsatisfiability of the
problem is due to the clauses derived from these error sources. In other words, the clauses
that are at conflict in the CNF correspond to the circuit-level error sources from which they
are derived. Therefore, the circuit-level errors can be identified by finding the CNF-level error
clauses.
The max-sat approach in Section 5.2.1 can identify Maximal Satisfiable Subsets (MSSes)
whose complements are Minimal Correction Sets (MCSes). These MCSes represent sets of
clauses whose removal from the CNF make the problem satisfiable. In the formula Φ constructed
using the constraints I, O, and CNF(C), the MCSes map directly to error clauses. Once the
error clauses are identified through MCSes, the gate-level suspects are found by mapping each
clause to the gate it is originally derived from as described in Section 5.2.
For example, consider the correct and erroneous circuit in Figure 5.1 (a) and (b) where gate
A is mistakenly implemented as an AND gate instead of an OR gate. Under the input stimulus
{a = 0, b = 1, d = 1} the circuit has a response of {e = 0} instead of the correct response of
Chapter 5. Debugging using Max-SAT 86
e
d
c
1
10 a
b
1B
A
(a)
e
d
cb
a
01
10
B
A
(b)
Figure 5.1: Correct and erroneous circuit
{e = 1}. The corresponding erroneous CNF for the circuit and the input/output vectors are
shown below.
(a) · (b) · (d) · (e)
(a + c) · (b + c) · (a + b + c)
(c + e) · (d + e) · (c + d + e).
Here, the max-sat approach described in Section 5.2.1 can return the MCS (a+ c) as a solution
because removing this clause from the CNF makes the formula satisfiable. Notice that this
clause is derived from the erroneous gate A.
The above example illustrates how the removal of an error clause can help identify the error
source. Further analysis of the example demonstrates that there are other clauses such as (c+e)
whose removal can satisfy the problem. Indeed, more than one error clause may exist in a given
problem corresponding to the many potential error sources at the gate-level. These are more
commonly known as equivalent errors or faults in the diagnosis literature [1]. Note that the
removal of the clause (a) also satisfies the problem, however since this constraint is not part of
the circuit component of the CNF (i.e. C), it is not considered as an error clause.
For the debugging technique to be complete, all equivalent errors must be found. Each of
these is known as a suspect error source because it may fix the problem such that erroneous
circuit produces the correct response for the given input vector. As a result, the AllMCSes
algorithm of Section 5.2.1 is used to find all error clauses and consequently all gate-level error
suspects.
Chapter 5. Debugging using Max-SAT 87
5.3.1 Error Clause Cardinality
Since the solution space for the AllMCSes algorithm is exponential, an explicit limit for the
maximum cardinality of the MCSes is advised to prevent memory explosion. In practice, this
limit, called the error clause cardinality, must be relatively small due to memory and perfor-
mance considerations. The error clause cardinality determines the completeness and efficiency
of the proposed technique.
Since this work is primarily concerned with gate-level debugging the limit used must cor-
respond with the gate-level cardinality of conventional debuggers. In Section 5.2 the error
cardinality Ng is defined as the maximum gate tuples that may be responsible for the erro-
neous behaviour. At the level of the CNF encoding, the error clause cardinality Nc must be set
to a value such that all the gate-level errors at Ng can be found using the proposed max-sat
approach. Thus completeness in this context is with respect to the gate-level debuggers such
as [33]. The following theorem proves that the proposed approach is complete for a given value
of Ng.
Theorem: The algorithm AllMCSes called on the problem Φ = I ·O ·CNF(C) with a limit
of Nc is complete if Nc is equal to [the maximum number of clauses derived for any single gate
in the CNF] ×Ng.
Proof: Proof by contradiction. Suppose there is a gate-level error not identified by the
proposed approach using the error cardinality limit Nc. Since AllMCSes iteratively finds sets
of clauses with cardinality 1 up to Nc, the gate-level error must be caused by more than Nc
clauses. However, Nc is equal to the maximum number of clauses derived from any one gate
times Ng, so the error must be caused by more than Ng gate-level sources. Therefore the error
is not found using conventional debuggers with Ng either. �
In many circuit-based SAT problems, the circuit is first converted to a 2-input AND-
INVERTER graph and then translated into CNF [10, 54]. In such a CNF formula, the maximum
number of clauses from any gate is 3, thus Nc = 3 × Ng. Using this value for Nc results in
finding all the solutions found using conventional debuggers with Ng. In CNF formulas derived
from arbitrary circuits where the number of clauses generated can greatly vary from one gate
to another, the proposed max-sat debugging technique may return more solutions than the
Chapter 5. Debugging using Max-SAT 88
gate-level debugger for a given Ng. As discussed further in Section 5.5 this scenario does not
pose a problem under the proposed framework.
5.3.2 Error Group Cardinality
The previous section presented a limit for the error clause cardinality to guarantee completeness
for the proposed approach. Although complete, increasing the error clause cardinality is not
always desired as the complexity of the debugging problem is exponentially related to the error
cardinality [91]. Here, the grouping ability described in Section 5.2.1 is used to reduce the
complexity of the problem while maintaining completeness.
Grouping all clauses derived from the same gate together allows the max-sat solver to
“enable” or “disable” all of those clauses simultaneously. In effect, this gives the solver the
ability to treat each gate as a single high-level constraint, leading to solutions (MCSes) found
directly in terms of the gates. Under this problem restriction, the error clause-group cardinality,
Ncg required to find gate-level errors can be effectively Ng.
Theorem: By grouping all clauses derived from the same gate together, the proposed
technique is complete if the error clause-group cardinality Ncg = Ng.
Proof: Since each group has a one-to-one correspondence with a circuit gate, when a group
is found as part of an MCS, all clauses corresponding to the original gate are “disabled” by
the AllMCSes algorithm. Thus every solution found by AllMCSes maps to a set of the original
gates. Hence, limiting the group cardinality is equivalent to limiting the gate cardinality. �
Re-visiting the example of Figure 5.1, grouping the clauses of gate A together with the
clause-selector variable yA and the clauses of gate B together with the clauses-selector variable
yB, results in the following CNF.
(a) · (b) · (d) · (e)
(a + c + yA) · (b + c + yA) · (a + b + c + yA)
(c + e + yB) · (d + e + yB) · (c + d + e + yB).
Chapter 5. Debugging using Max-SAT 89
5.4 Extension to Sequential Circuits and
Multiple Vectors
Debugging sequential circuits is similar to that of combinational circuits except that their
behaviour must be modeled for a finite number of clock cycles. These clock cycles are necessary
to excite and observe the errors. A popular approach for modeling sequential circuits is to
use the time frame expansion technique or the Iterative Logic Array (ILA) representation.
This technique replicates a circuit’s transition relation, called a time frame, and connects the
current-state and the next-state of adjacent time frames together. In effect, the sequential
circuit is transformed into an “unfolded” combinational circuit that can be debugged like any
other combinational circuit. Although not require for this section, further detail on the ILA
technique can be found in Section 4.2.
Since the complexity of debugging increases exponentially with the number of error sources,
debuggers must be careful not to consider the “replicated” gates across time frames as unique
error sources. For example, a single gate-level error in an ILA with 3 time frames may appear
to have 3 distinct error locations, however, replacing the functionality of a single gate in the
original sequential circuit will fix the problem in all time frames.
The proposed max-sat debugging technique can be extended to handle sequential designs
efficiently. First, the sequential circuit is converted to an ILA and then translated into CNF.
Similar to the previous formulation the CNF is then constrained with input stimulus and output
response, I and O resulting in
Φ = I · O · CNF(ILA(C)).
Here, we assume that the initial state constraints are also contained in I. The second step
is to account for the replication due to the ILA by grouping all clauses derived from the same
gate but from any time frame. As a result, clauses from a particular gate will be “enabled” and
“disabled” at once irrespective of the time frames they represent.
For example consider the erroneous sequential circuit shown in Figure 5.2(a) and its ILA in
Figure 5.2(b). Here, the gate A has been erroneously implemented as an AND gate instead of an
OR gate. As a result, the output of A in the first and second time frames should be 1 instead
Chapter 5. Debugging using Max-SAT 90
D Q
Be
d
A
b
a c
(a)
1
10
1
1
1
1
0
1
1
e1
A
b1
a1
c1
Bd1
A
b2
a2
c2
Bd2
A
b3
a3
c3
Bd3
e2 e3
(b)
Figure 5.2: Erroneous sequential circuit and its ILA representation
of 0. Note that the input stimulus and correct response are also shown in Figure 5.2(b). The
corresponding CNF for the constrained ILA is shown below.
(a1) · (b1) · (d1) · (e1)
(a1 + c1) · (b1 + c1) · (a1 + b1 + c1)
(a1 + e1) · (d1 + e1) · (a1 + d1 + e1)
(c1 + a2) · (c1 + a2)
(b2) · (d2) · (e2)
(a2 + c2) · (b2 + c2) · (a2 + b2 + c2)
(a2 + e2) · (d2 + e2) · (a2 + d2 + e2)
(c2 + a3) · (c2 + a3)
(b3) · (d3) · (e3)
(a3 + c3) · (b3 + c3) · (a3 + b3 + c3)
(a3 + e3) · (d3 + e3) · (a3 + d3 + e3)
In the above example, the clauses corresponding to gate A in both time frames 1 and 2
are responsible for the discrepancy between the actual and correct response. Specifically, these
are (b1 + c1) and (b2 + c2). However, by grouping all clauses derived from gate A together
and those from gate B together, irrespective of the time frames, the single group solution is
returned. Below is the modified CNF based on grouping clauses from gate A (B) together with
the clause-selector variable yA (yB).
Chapter 5. Debugging using Max-SAT 91
(a1) · (b1) · (d1) · (e1)
(a1 + c1 + yA) · (b1 + c1 + yA) · (a1 + b1 + c1 + yA)
(a1 + e1 + yB) · (d1 + e1 + yB) · (a1 + d1 + e1 + yB)
(c1 + a2) · (c1 + a2)
(b2) · (d2) · (e2)
(a2 + c2 + yA) · (b2 + c2 + yA) · (a2 + b2 + c2 + yA)
(a2 + e2 + yB) · (d2 + e2 + yB) · (a2 + d2 + e2 + yB)
(c2 + a3) · (c2 + a3)
(b3) · (d3) · (e3)
(a3 + c3 + yA) · (b3 + c3 + yA) · (a3 + b3 + c3 + yA)
(a3 + e3 + yB) · (d3 + e3 + yB) · (a3 + d3 + e3 + yB)
For debugging problems with multiple vectors, ~I = {I1, I2, ...}, ~O = {O1, O2, ...}, the union
of the CNF problems for each vector results in a single constraint system. In other words the
CNF corresponding to the circuit, C is again replicated for each vector. Similar to the approach
for sequential circuit, all clauses derived from the same gate, regardless of which replica of C
they occur in, must be grouped together and treated as a single error source. It should be
noted that the groupings for multiple vectors and sequential circuits is in addition to the gate
groupings discussed in Section 5.3.
5.5 Debugging with Approximate Max-sat
In practice, debugging via an exact max-sat formulation may not be practical, as the num-
ber of groups and clauses under consideration can be quite high thus resulting in a “hard”
max-sat problem. The proposed max-sat strategy can be easily modified to perform an over-
approximation instead of finding exact solutions. The benefit of the over-approximation is that
the speed and resolution trade-off can be adjusted for the problem: reducing the resolution or
granularity of the solutions found yields improved performance in terns of run-time.
The over-approximation is achieved by grouping clauses together as described in Section 5.2.1
and finding the MCSes in terms of the groups. Note that the groupings discussed here are in
addition to those presented in Section 5.3 and 5.4. Different grouping strategies can be easily
formulated ranging from random groupings to those based on a circuit’s topology or structure.
Chapter 5. Debugging using Max-SAT 92
Similarly, groups can differ in cardinality from a single clause to thousands of clauses. For in-
stance, a set of clauses can be grouped together if they are in the same fanout-free cone which is
similar to the dominator debugging technique introduced in [91]. Another example is grouping
based on a high-level modules derived from RTL similar to the technique of [3]. Intuitively,
generating groups based on the circuit’s structure or modularity may be advantageous as fewer
solutions/suspects may be returned compared to arbitrary grouping schemes.
Grouping clauses may increase the effect of error masking, in which some error sources
may not be detected as they are masked by others [3]. This also occurs in traditional diagnosis
techniques when error-free models are used. For instance, consider the gates shown in Figure 5.1
and a pair of errors on gates A and B. In this scenario, the single model-free error, A, masks
the pair solution of A and B.
Similar scenarios can occur when grouping clauses together, especially if the groups are
made arbitrarily. For instance, consider the CNF illustrated in Figure 5.3 where some clauses
are grouped in A and other are grouped in B. Further consider a pair of error clauses illustrated
by the “X”. Here, the single solution identifying group A masks the pair solution A and B. It
should be emphasized that error masking is not unique to the proposed technique as it occurs
in gate-level and hierarchical debugging as well [3]. Generally, in all debugging approaches the
user must be aware of possibilities of error masking.
X
clause groupX
clause group
Clauses derived from circuit
B
A
Figure 5.3: Error masking in clause groupings
Chapter 5. Debugging using Max-SAT 93
debuggerexact
stage 2erroneousdesign
input/outputvectors
stage 1
suspectsover−approximate exact
suspectsmax−satdebugger
(AllMCSes)
G
Figure 5.4: Max-sat debugging framework
5.5.1 Efficient Max-sat Framework
This section presents a performance optimized debugging framework using the discussed max-
sat technique. The complexity of conventional debugging techniques such as SAT-based tools
depend to a large extent on the number of suspects that must be considered. In the past,
divide and conquer schemes based on the problem hierarchy have proven beneficial [3]. Here,
the approximate max-sat approach can be used as a filter to remove the majority of the suspects
by quickly finding over-approximate solutions. Subsequently, any exact debugging approach can
be used and will benefit greatly by not having to consider all the original suspects during its
analysis.
Any type of grouping can be used; however, in the remainder, clauses are grouped in sets of
size G according to their corresponding circuit-level topology. Every group contains G clauses
(except for one group that contains the remainder of the clauses in the CNF) from gates in
close proximity to one another. For sequential circuits and multiple vectors, the group size is
G× [the total number of replications] as described in Section 5.4. Figure 5.4 illustrates the flow
of the proposed framework where the suspects are first filtered by the max-sat engine and then
processed by the exact debugger. The optimal value of G, found experimentally, determines
how the debugging effort is divided between the two stages.
5.6 Experiments
The proposed framework is implemented in C++ using the max-sat algorithm (AllMCSes)
in [60] and the SAT-based debugging engine in [33] as a second stage debugger. Six combina-
tional and ten sequential circuits from ISCAS85, ISCAS89 and ITC99 benchmarks as well as
Chapter 5. Debugging using Max-SAT 94
20 40 60 80 100 120 1400
5
10
15
20
run−
time
(sec
)
max−satdebugtotal
clause grouping size
(a) c6288
5 10 15 20 25 30 350
0.2
0.4
0.6
0.8
clause grouping size
run−
time
(sec
)
max−satdebugtotal
(b) mot-comb3
Figure 5.5: Run-time versus clause grouping size
OpenCores.org [77] are used to construct several design debugging problems. The erroneous
circuits are obtained by manually changing the functionality of a single gate at random. The
failing test vectors are generated by running pseudo-random simulations until an erroneous re-
sponse is observed. Experiments are conducted using both single and four failing test vectors.
The performance of the proposed framework utilizing the max-sat pre-processing is compared
against the efficiency of the SAT-based debugging engine in [33] without pre-processing. In all
experiments, the size of the clause group error cardinality Ncg is set to one to find the single
error sources. In addition to the groups created for the over-approximation, clauses are also
grouped together based on the circuit replicas as discussed in Section 5.4. Experiments are con-
ducted on a Pentium IV 2.8 GHz Linux platform with a 1GB memory limit and 3600 seconds
time-out.
In order to determine the effectiveness of the overall debugging framework of Section 5.5.1
as a function of the group size G, experiments are conducted on several representative circuits.
Figure 5.5 (a) and (b) shows two such experiments, using circuit c6288 and mot-comb3, where
three curves representing the run-times of the over-approximate max-sat stage, the exact de-
bugging stage, and the combined run-times are presented for several group sizes. The run-time
of max-sat increases abruptly as the group size becomes very small, and it reaches a maximum
when the exact method is used (single-clause groups). However, as the group size increases, the
run-time of the second stage debugger increases as it must consider many more suspects due to
Chapter 5. Debugging using Max-SAT 95
the over-approximation. The combined curve shows the total run-time of the overall framework
is minimized with group sizes of roughly 10 to 20 clauses.
In the remaining, “max-sat20+debug” refers to the proposed framework with a grouping
size of G = 20. For sequential designs and multiple vectors the actual number of clauses per
group is 20 times the number of circuit replicas. Table 5.1 compares max-sat20+debug to the
stand-alone debugger of [33]. Rows 1− 6 report experiments with combinational circuits given
a single failing test vector, and 7 − 16 (17 − 26) report experiments with sequential circuits
given one (four) failing test vector(s). The first four columns contain the circuit’s name, its
size in gates, the number of test vectors used, and the total number of circuit replicas needed.
The fifth column (# error locs) gives the total number of potential error locations that could
explain the faulty behaviour of the circuit (the complete set). These are the locations expected
to be returned by both approaches when available. The sixth column gives the run-time of the
stand-alone debugger. An entry of [TO] denotes a time-out, and [MO] denotes a memory-out.
Chapter
5.
Debuggin
gusin
gM
ax-SAT
96
Circuit and debugging info Debug Max-sat20+debug
name# # # # error time max-sat debug total time X
improv.gates vecs repl. locs (sec) # grps # suspects % susp red time (sec) time (sec) (sec)
mot-comb1 2, 162 1 1 4 4.79 3 49 97.73% 0.03 0.05 0.08 59.88mot-comb2 5, 487 1 1 13 54.50 13 178 96.76% 0.13 0.24 0.37 147.30mot-comb3 11, 268 1 1 16 357.67 14 189 98.32% 0.27 0.47 0.74 483.34c6288 3, 466 1 1 75 67.96 48 536 84.54% 0.45 1.23 1.68 40.45c7552 2, 644 1 1 248 25.66 74 789 70.16% 0.11 3.11 3.22 7.97c5315 1, 884 1 1 11 4.83 7 99 94.75% 0.04 0.07 0.11 43.91
rsdecoder 12, 041 1 2 11 572.68 7 126 98.95% 0.67 0.65 1.32 433.85spi 2, 012 1 21 19 80.54 12 194 90.36% 1.15 2.99 4.14 19.45erp 2, 449 1 3 13 36.09 11 179 92.69% 0.20 0.25 0.45 80.20ac97 15, 599 1 6 4 [TO] 3 58 99.63% 2.22 1.45 3.67 > 980.93reactimer 265 1 512 7 51.81 6 89 66.42% 47.58 6.15 53.73 0.96divider 5, 248 1 15 4 1, 160.39 3 52 99.01% 14.58 1.32 15.90 72.98b14 5, 695 1 22 45 1, 377.86 36 627 88.99% 11.17 50.75 61.92 22.25b15 8, 938 1 13 32 [TO] 40 645 92.78% 96.99 65.82 162.81 > 22.11s15850 10, 481 1 2 19 747.36 12 183 98.25% 0.53 0.71 1.24 602.71s38584 21, 006 1 14 58 [TO] 34 566 97.31% 28.02 36.00 64.02 > 56.23
rsdecoder 12, 041 4 8 11 [TO] 7 126 98.95% 2.88 2.01 4.89 > 736.20spi 2, 012 4 81 4 264.07 6 107 94.68% 4.95 4.39 9.34 28.27erp 2, 449 4 12 4 73.71 5 101 95.88% 0.82 0.52 1.34 55.01ac97 15, 599 4 23 4 [TO] 3 58 99.63% 9.95 5.05 15.00 > 240.00reactimer 265 4 1, 745 6 172.30 6 89 66.42% 2, 845.80 21.48 2, 867.28 0.06divider 5, 248 4 71 4 [TO] 3 52 99.01% 54.74 5.44 60.18 > 59.82b14 10, 114 4 1, 216 − [MO] − − − [MO] − − −b15 8, 938 4 62 − [TO] − − − [TO] − − −s15850 10, 481 4 8 19 [TO] 12 183 98.25% 2.21 3.64 5.85 > 615.38s38584 21, 006 4 178 35 [MO] 20 365 98.26% 626.45 376.62 1,003.07 > 3.59
Table 5.1: Max-sat+debug versus stand-alone debugger
Chapter 5. Debugging using Max-SAT 97
10−1
100
101
102
103
5
10
15
20
run−time (sec)
num
ber
of s
olve
d in
stan
ces
max−sat20+debugdebug
Figure 5.6: Number of solved instances for max-sat20+debug and debug
The remaining columns present the results of our proposed framework. The first four
(# grps, # suspects, % susp red, and time (sec)) report the number of groups (of 20× # repl.
clauses) returned by the AllMCSes algorithm in any MCS; the number of suspect variables
identified by those groups, each corresponding to a potential gate-level error source; the percent
reduction in the number of suspect gates; and the run-time of this first stage. The true benefit
of the proposed technique is evident when considering the number of suspects that are filtered
by the first stage with relatively small run-time. For instance consider the circuit ac97 with
a single vector. The approximation technique rules out 99.64% of the suspects in just 2.22
seconds. On average, the number of suspects is reduced by over 92%.
The run-time in seconds of the second stage debugger using the suspects of the first stage is
shown in column debug time (sec). Finally, the total time (sec) column shows the combined run-
time of the proposed framework. This number is compared with the run-time of the stand-alone
debugger in column six to get the improvements shown in the final column (X improv.).
These results demonstrate the overwhelming advantage of the proposed method over the
stand-alone debugging engine as the run-times are reduced by an average of 200 times. For
combinational circuits, the number of solved instances is increased from 16 to 24 out of 26,
a 50% improvement, and for sequential circuits with one (four) test vector(s), the number of
solved instances is increased from 7 (3) to 10 (8), a 43% (167%) improvement.
Chapter 5. Debugging using Max-SAT 98
10−1
100
101
102
103
10−1
100
101
102
103
run−time of max−sat20+debug (sec)
run−
time
of d
ebug
(se
c)
Figure 5.7: Run-time comparison for max-sat20+debug and debug
Figure 5.6 plots the number of solved instances as a function of run-time on a logarithmic
scale for max-sat20+debug and stand-alone debug. It can be seen that max-sat20+debug out-
performs the stand-alone approach by roughly two orders of magnitude across all problems.
Figure 5.7 plots the total run-time of max-sat20+debug for each instance against the corre-
sponding run-time of stand-alone debugger on a logarithmic scale. Clearly, most points lie
above the 45o line which indicate the better performance of the proposed framework. Points
on the upper border indicate the instances solved by max-sat20+debug but unsolved by the
stand-alone approach. The single point where the proposed framework fares essentially worse
is caused by the large run-time of the first stage. Such cases can be addressed by increasing
the group size G, thus reducing the difficulty for the AllMCSes algorithm.
5.7 Summary
This work presents an efficient two stage debugging framework which uses a novel max-sat
problem formulation. First, it is shown that the debugging problem can be solved exactly
with a max-sat formulation. The approach is extended for sequential circuits and for problems
with multiple vectors. An over-approximation technique is developed to take advantage of
the strengths of the max-sat techniques. This technique considers groups of clauses together
and can thus make decisions based on the groups instead of the individual clauses. The over-
Chapter 5. Debugging using Max-SAT 99
approximation technique is used as a pre-processing step that filters the majority of suspects
and reduces the problem complexity drastically for any debugger used in the second stage.
Experiments demonstrate overwhelming run-time improvements of two orders of magnitude on
average.
Chapter 6
Trace Reduction
6.1 Introduction
Whether debugging is performed manually or with an automated debugger, there are two major
factors that effect its efficiency. Two of the crucial factors are the circuit size and the error trace
length. Techniques such as abstraction/refinement and BMD presented in Chapters 3 and 4
reduce the challenges imposed by the design size and trace length, respectively. However, for
debugging problems with very long traces additional techniques are required to assist BMD.
For instance, consider the case where an error is excited and its effect is stored in RAM only
to be read after a few hundreds of clock cycles. Once the RAM data is read and its erroneous
behaviour propagates to a primary output a failure is observed. The length of the error trace
can be hundreds, thousands, or even millions of clock cycles long. In such situations the problem
must be reduced to a manageable size for an automated debugging tool to be able to operate
effectively.
Trace reduction (also known as trace compaction) is a technique that generates a new simu-
lation trace from an original one returned by a verification tool. This new trace allows a circuit
to transition from an initial state to a final state with a fewer number of clock cycles than the
original trace while exhibiting a similar erroneous behaviour. In debugging, the initial state
is often a reset or a reachable state, the final state is where the failure is observed, and the
original trace is provided from a simulation testbench or a formal verification tool. Since the
100
Chapter 6. Trace Reduction 101
majority of verification performed in industry is simulation-based with a heavy usage of ran-
dom or constrained-random stimulus patterns, many error traces are found to be unnecessarily
long. In other words, a shorter error trace may be able to reproduce the failure in fewer clock
cycles. Returning to the previous RAM example, the minimal trace length must excite the error
source, write it to the RAM, read from the RAM, and propagate the effects to the primary
outputs. Thus the transitions between writing to the RAM and reading from the RAM may
be superfluous and they can be omitted.
Trace reduction can often reduce the size of a trace by orders of magnitude. With a shorter
trace, the debugging task of the verification engineer can be considerably easier as fewer signals
and clock cycles must be analyzed. Similarly, an automated debugger may be able to solve
problems much faster and with fewer resources. Although powerful for debugging, trace reduc-
tion can benefit applications such as stimulus pattern generation, property checking and silicon
debug as well [12, 101].
Previous work shows that for random and constrained-random based simulations, error
traces can often be reduced to a fraction of their initial size [17, 20, 78, 90]. One such technique
uses forward image computation using Binary Decision Diagrams (BDDs) to reduce the trace
length [20]. In [90], techniques are presented to remove variables from counter-examples in
order to simplify them, but their length is not reduced. Another recent work uses several
techniques based on performing further simulations and Bounded Model Checking (BMC) to
achieve smaller traces [17]. The technique of [78] is the closest to ours as they utilize a sequential
Boolean Satisfiability (SAT) solver to find short-cuts in the original trace. More specifically,
[78] seeks to find the shortest path from the initial state to some candidate intermediate state
similar to BMC but using a sequential SAT solver.
In this chapter, we propose a trace length compaction technique where the shortest path
from the initial state to a final state is sought. This approach is based on reachability analysis
where an all-solution SAT solver is used as the pre-image computation engine [51, 69, 71].
The benefits over the existing BDD [20] and BMC techniques [17] are that the BDD memory
explosion problem can be averted and that compactions exceeding the finite bound of BMC
approaches may be applied. Our technique appears to share many of the advantages of the
Chapter 6. Trace Reduction 102
sequential SAT approach proposed in [78]. The main difference is that ours relies on reachability
analysis and pre-image computation that results in large state sets. We develop a novel data
structure to quickly determine possible state containment relationships, or which states are
contained within the found state sets, to reduce the trace lengths.
More specifically, the contributions in this Chapter are the following:
• A trace compaction technique based purely on pre-image computation and reachability
analysis using an all-solution SAT solver.
• A set of containment rules that help draw relationships between existing states and states
found through pre-image computation which may result in shorter traces.
• A state selection procedure within the reachability analysis engine and a set of heuristics
that improve the performance of the overall approach in practice.
• A novel data structure for storing visited states that allows for quick identification of state
containment relationships.
This chapter is organized as follows. In the next section, some background information
is provided on finite state machines, pre-image computation, and reachability analysis. Sec-
tion 6.3 presents the proposed trace compaction approach and discusses its central procedures.
Section 6.4 introduces a novel data structure critical for the efficient performance of the pro-
posed approach. Sections 6.5 and 6.6 demonstrate the experimental results and conclude the
chapter, respectively.
6.2 Preliminaries
In this section we provide some background on finite state machines, traces, image and pre-
image computation, and reachability analysis.
6.2.1 Finite State Machines
A sequential digital circuit can be modeled by a Finite State Machine (FSM) represented by a
6-tuple M := (Q,Σ, ∆, δ, λ, q0) where Q is the finite set of states, Σ are ∆ are the input and
Chapter 6. Trace Reduction 103
output alphabets respectively, δ : Q×Σ → Q is the state transition function, λ : Q×Σ → ∆ is
the output function, and q0 is the initial state [53]. Figure 6.1 illustrates a simple FSM where
the states are represented by nodes and the transitions are represented by edges.
q0
aa
a
a
67
89
10aq4
a1 a3
a4
a5
a2q1 q2
q6
q5
q3
Figure 6.1: Finite State Machine with 7 states
A trace of length k for an FSM is an input sequence < a1, a1, ..., ak > that leads the FSM
through a sequence of states < q0, q1, ..., qk−1, qk >. Note that some states may be repeated in
the state sequence. Figure 6.2 represents one possible trace for the FSM of Figure 6.1.
6
a a a a2 3 4 5a1qq0 q1 q2 q3 q1
Figure 6.2: A sample trace for the above FSM
6.2.2 Image and Pre-image Computation
Given a sequential circuit with current state variables V and next state variables V ′, a set
of current states and a set of next states are labeled by Q(V ) and Q(V ′) respectively. The
transition relation from a set of states Q(V ) to Q(V ′), denoted by T (Q(V ), Q(V ′)), is true for
each pair of Q(V ) and Q(V ′) if δ(Q(V )) = Q(V ′) for a set of input assignments [53]. Given the
above, the image and pre-image of a circuit can be defined as follows.
image: Q(V ′) = ∃V.(T (Q(V ), Q(V ′)) ∧ Q(V ))
pre-image: Q(V ) = ∃V ′.(T (Q(V ), Q(V ′)) ∧ Q(V ′))
Intuitively, the image of a state qi is all the states that can be reached from qi under all
Chapter 6. Trace Reduction 104
possible input combinations in a single clock cycle. Similarly, the pre-image of qi comprises of
all the states that can lead to qi under all possible input combinations in one clock cycle. In
the FSM of Figure 6.1, the image of state q1 is {q2, q6} while its pre-image is {q0, q3}.
Although the image and pre-image of circuits are traditionally computed using BDDs [53],
some techniques based on all-solution Boolean Satisfiability (SAT) solvers can also be used [51,
59, 71, 82]. All-solution SAT solvers can compute the pre-image set Q(V ) by constraining the
circuit CNF to Q(V ′) and iteratively finding all the solutions that satisfy the CNF in terms
of the current state variables V [82]. Recent work on SAT-based Unbounded Model Checking
(UMC) and pre-image computation techniques have demonstrated considerable advancements
[51, 59, 71, 82].
In this work, we are mainly concerned with SAT-based pre-image computation. Since this
technique finds states one at a time, we use the term pre-image loosely to also refer to a single
state qjthat belongs to the pre-image of qi. Furthermore, we use the term state to refer to a state
cube, which is a state encoding that may contain unassigned or don’t care variables. As such, a
state may be a superset (cover) of other states. For instance, the state cube {v1, v2, v3} =1X1
covers the states {v1, v2, v3}=101 and {v1, v2, v3}=111. For brevity, in the remaining of this
paper we drop the variable names (i.e. v1, v2, v2) when describing state values.
6.2.3 Reachability Analysis
Reachability analysis is the process of determining whether a state qk is reachable from another
state q0. In the realm of UMC, reachability analysis can be used to check CTL properties of
type EFqk where qk is a bad state and q0 is a legal or initial state [53].
Intuitively, reachability analysis traverses the state space backwards from state qk until a
state q0 is found or a fix-point, where no new states are found, is reached [53]. Pre-image
computation is a central procedure of reachability analysis as it performs the single backward
steps. The manner in which the state space is traversed depends on which of the visited states
is selected for each pre-image computation step. If the visited states are stored in a stack-like
data structure, a depth-first traversal is performed, while a queue-like data structure results in
a breadth-first traversal. Figure 6.3 illustrates a breadth-first reachability analysis process that
Chapter 6. Trace Reduction 105
initial state found
0q
k
pre-image 3
pre-image 1
pre-image 2
q
pre-image i
Figure 6.3: Illustration of reachability analysis
eventually finds the initial state q0. In this figure, the black nodes represent states while each
cone represents a set of states found by one pre-image computation step.
6.3 Proposed Trace Compaction Approach
In this section we present our proposed trace length compaction approach. First we introduce
the central concept followed by details of the state selection procedure and the all-solution SAT
solver.
6.3.1 Reachability Based Trace Compaction
A trace can be represented by a directed graph G = (N, E) where the nodes N represent states
and the edges E represent transitions between states. An edge from state qi to qj denotes that
qi belongs to the pre-image of qj and qj belongs to the image of qi. Our objective is to reduce
the length of the path from the initial state q0 to the final state qk by applying pre-image
computation and reachability analysis techniques.
Our proposed approach performs reachability analysis on all the states belonging to the
original trace. The manner in which states are selected for reachability analysis is described in
Section 6.3.3. All the states (or state cubes) found by the pre-image computation steps of the
reachability engine are added to the graph G. Graph G is updated with edges denoting that
each newly found states qi is a pre-image of some state qj , selected for pre-image computation.
Chapter 6. Trace Reduction 106
2
5q
q
4qq0 q1 q2 q3
Figure 6.4: Updating the graph G with new nodes and edges
When states found by pre-image computation already exist in the graph G, extra edges may
be drawn in G to illustrate new legal transitions. These transitions may provide a shorter path
(or short-cut) from the initial state to the final state thus reducing the overall trace length.
For example consider the situation described in Figure 6.4 where the original trace is shown as
the sequence < q0, q1, q2, q3, q4 > and the dashed nodes are states found through reachability
analysis. Since q2 is found as a pre-image of q4, and q1 is the pre-image of q2 in the original
trace, a new edge shown as dashed line can be drawn directly from the original (non-dashed)
q2 to q4 and the dashed q2 can be removed. The overall result is a shorter path from q0 to q4
which skips node q3.
As motivated by the above example, finding state equivalences in the graph G can lead to
more “short-cuts” which can reduce the overall trace size. Along with the state equivalence
relation discussed, there are other state containment relationships that can lead to further
short-cuts in the graph. The following rules determine how the graph G is updated after each
pre-images computation step.
Consider state qi found as a pre-image of state qi+1, and the sequence < qj−1, qj , qj+1 >
existing in the graph G.
• Rule 1. If qi = qj : State qi is not added to G, but an edge is drawn from qj to qi+1.
• Rule 2. If qi ⊃ qj : State qi is added to G, an edge is drawn from qi to qi+1, and another
edge is drawn from qj to qi+1.
• Rule 3. If qi ⊂ qj : State qi is added to G, an edge is drawn from qi to qi+1, another edge
is drawn from qj−1 to qi, and another edge is drawn from qi to qj+1.
Chapter 6. Trace Reduction 107
The correctness of rule 1 is evident as the images of equivalent states are also equivalent.
Rule 2 can be explained by expanding the state cube qi, into two components qi = {qj}⋃{qi −
qj}. From here we use the fact that any image of qi is also an image of qj . Similarly, rule 3 can
be explained by expanding qj into two components qj = {qi}⋃{qj − qi}.
1X1=
j-1 j+1
i+1
jqqq
qi
q
...
...
...
=101 1X1
101=
j-1 j+1
i+1
jqqq
qi
q
...
...
...
=
(a) (b)
Figure 6.5: Illustrating rules 2 and 3
The following example helps clarify rules 2 and 3. Consider state qi found as a pre-image
of state qi+1, and the sequence < qj−1, qj , qj+1 >, where state qi =1X1 and the state qj = 101.
By rule 2, an edge is first drawn from qi to qi+1 to indicate that qi is a pre-image of state qi+1.
Since 1X1 ⊃ 101 and qi+1 is an image of qi =1X1= {101}⋃{111}, then qi+1 must also be an
image of qj = 101. This scenario is illustrated in Figure 6.5 (a) with the new edges drawn as
dashed lines. Similarly, by rule 3 an edge is first drawn to indicate that qi is a pre-image of
state qi+1. Since state qi =101 is a subset of state qj =1X1= {101}⋃{111}, then the states
qj−1 and qj+1 must be a pre-image and an image of qi also, respectively. The three edges added
in this scenario are drawn as dashed lines in Figure 6.5 (b).
Our overall trace compaction technique using reachability analysis is shown in Figure 6.6.
Lines 1-7 set up the problem, build the initial graph G and determine the initial trace length.
The remaining lines perform reachability analysis by selecting a state for pre-image computation
(line 10), computing the pre-images (line 12), and applying the state containment rules (line
14). The reachability analysis is terminated after all states have been selected for pre-image
computation or after a maximum, max, number of steps have been performed determined by
the counter.
Chapter 6. Trace Reduction 108
6.3.2 Creating More Short-cuts
As discussed in the previous section, the containment rules are critical for creating short-
cuts in the graph G. To increase the likelihood of applying these rules, the reachability engine is
slightly modified from its typical UMC application. Traditionally in UMC, reachability engines
focus on finding only new states and “block” previously visited states [71]. This allows them
to quickly identify when a fixed-point is reached, or when all legal states are visited [51]. In
contrast, this work encourages finding previous states or states that cover or are covered by
others. These containment relationships allow us to draw additional edges between nodes and
increasing the likelihood of reducing the trace. It should be noted that precautions are taken
to avoid repeatedly visiting the same set of states.
A second technique used to increase the likelihood of applying the containment rules is
to populate the graph with more states than those provided in the original trace. Since the
original trace only has as many states as its trace length, there may not be enough unique states
to create many short-cuts. We propose populating the graph initially by computing a single
pre-image for the states in the original trace. This approach allows us to quickly add state
cubes to the graph which leads to more applications of the containment rules. The practical
advantage of this technique is highlighted in the experiments of Section 6.5.
6.3.3 State Selection Procedure
During reachability analysis, which state is selected for pre-image computation determines the
manner in which the state space is traversed. For instance, if the most recently visited (found)
state is always selected, then the state space is traversed in a depth-first manner. Here, we
develop state selection criteria that help guide the reachability engine towards finding short-
cuts from the initial state to the final state. It should be noted that these criteria are heuristics
which may not always be advantageous.
The first criterion is to select a candidate state from the set of visited states with the
smallest hamming distance to the initial state q0. The hamming distance between two states
is the number of state variables with different values (0 or 1). For states with don’t cares (X),
Chapter 6. Trace Reduction 109
1: G = ∅2: V isited = ∅3: counter = 04: for all (states qi between q0 to qk (inclusive))
do
5: V isited.add(qi)6: G = add to graph(qi)7: end for
8: length = BFS(G, qk, q0)9: while (counter ≤ max && !V isited.empty())
do
10: qj = select state(V isited)11: V isited = V isited− qj
12: PreImages = pre-image(qj)13: for all (states qi ∈ PreImages) do
14: apply rules1 2 3(G, qi, qj)15: end for
16: V isited = V isited⋃
PreImages17: counter = counter + 118: length = BFS(G, qk, q0)19: Print(Trace is of size length)20: end while
21: return length
Figure 6.6: Trace compaction procedure using reachability analysis
every X matches both the 0 and 1 value. For instance, if states {1100, 1011, 110X, XX01} are
visited and q0 = 0000, then state XX01 is selected since it has a hamming distance of 1 with
respect to q0. The intuition behind the above criteria is that states with a smaller hamming
distance to q0 require less state variables to change to reach q0 as a pre-image. Therefore, the
likelihood of finding q0 at the next step may be higher.
A second factor that influences the state selection procedure is the path length from a
candidate state to the last state qk. If this length is greater than 50% of the current shortest
path from q0 to qk then the state is not considered for selection. This criteria encourages
finding many pre-images near the end of the trace (closer to qk) and less closer to the initial
state. Together, both criteria increase the probability of creating large short-cuts between states
at the two ends of the original trace.
6.3.4 All-Solution SAT Solver
The reachability engine is highly dependent on the performance of the pre-image computation
engine, which is based on an all-solution SAT solver. This SAT solver uses circuit don’t cares
Chapter 6. Trace Reduction 110
to determine whether variables may remain unassigned while satisfying the problem [82, 102].
Since the don’t cares are propagated backwards through a gate (from output to input) they
are ideal for pre-image computation where current state variables V can be viewed as pseudo
inputs to the circuit. The all-solution SAT solver contains many solution reduction techniques to
ensure that small solutions are returned in an efficient manner [51, 71, 82]. For our application,
achieving small state cubes is critical to traversing the state space efficiently.
Each pre-image computation step corresponds to a call to the all-solution SAT solver. Since
it may not be practical to find all of the pre-image states due to the exponential nature of the
problem, the all-solution SAT solver is also equipped with a limit t. If all the pre-image state
cubes are not found in a time and memory efficient manner, the all-solution SAT solver will
return the first t state cubes it finds. This allows us to perform reachability analysis by finding
partial pre-images.
6.4 Storing Visited States
The success of the reachability analysis approach described in Section 6.3 depends on the ability
to quickly apply the rules of Section 6.3.1. More specifically, the situations where a newly found
state qi 1) is equal to existing states, 2) is a superset of existing states, or 3) is a subset of
existing states must be rapidly identified. In this section we introduce a data structure that
stores all the states belonging to G while identifying the state containment relationships quickly.
Note that this data structure is not only viable for trace compaction, but can also be used for
reachability analysis within a UMC framework [51, 59, 71].
6.4.1 Determining State Containment Relationships
The data structure described here is composed of two components 1) a binary tree T and 2)
a hash table. The binary tree is used to detect the state containment relationships, while the
hash table is used to locate the exact state.
The state containment relationship depends on the number of don’t cares in each state. A
state with more don’t cares may cover one with fewer, while the converse is not true irrespective
Chapter 6. Trace Reduction 111
0
0
1
1
1
1
1
1
1
1
2
3
4
5
2
3 3
4
5
Hash table
1101X
XX001X00X1
001X1
X11XX
Figure 6.7: Illustrating state storage data structure
of the actual position of the don’t cares. To take advantage of the above, we allocate an ordered
cube for each state. The ordered cube is defined as the state value with all the zeros in the most
significant positions, followed by all ones, followed by the don’t cares (X) in the least significant
positions. For example, five states and their corresponding ordered cubes are shown below.
states 1101X 001X1 XX001 X00X1 X11XX
ordered cube 0111X 0011X 001XX 001XX 11XXX
When states are added to the graph G, they are also stored according to their ordered cube
in the binary tree T . Each node of a given depth in the binary tree corresponds to a position in
the ordered cube. The top-most node at depth zero of the tree represents the most significant
position, the nodes at depth 1 represent the second most significant position, the nodes at depth
2 represent the third most significant position, etc. The left (right) edge of a node denotes a
zero (one) in the ordered cube at the position corresponding to the parent node. There are
no edges corresponding to a don’t care in the ordered cube. By scanning over the values of
an ordered cube from the most significant to the least significant, the binary tree is traversed
for that cube. Traversal ends when the ordered cube is fully scanned or when a don’t care is
encountered. By the end of the traversal, the final visited node points to a hash table where
the state value is stored.
The hash table contains all states that map to the same ordered cube. For instance, at the
node corresponding to the ordered cube 001XX in Figure 6.7, there can be two unique state
Chapter 6. Trace Reduction 112
cubes XX001 and X00X1. Figure 6.7 illustrates how the states 1101X, 001X1, XX001, X00X1,
X11XX are stored in the described data structure.
Given a state qi, this data structure can efficiently determine whether qi already exists in
G, whether qi is a subset of other states in G, and whether qi is a superset of other states in
G. For all three tasks, first the node ni corresponding to ordered cube of qi must be located in
the binary tree. If qi exists in the hash table pointed by node ni, then qi already exists in G.
To find whether qi is a proper subset of other states, all the nodes with at least as many
don’t cares (X) as ni have to be visited. At each node, the states within the hash tables must
be tested to determine if qi is a subset. Within the tree T , the nodes with at least as many
don’t cares as ni are found inside an r+1 by s+1 rectangle, where r is the number of zeros and
s is the number of ones in qi. Therefore, there are (r + 1) × (s + 1) nodes that can potentially
contain supersets of qi (including node ni). These nodes are illustrated in the dashed rectangle
above node ni in Figure 6.8.
Similarly, to find whether qi is a proper superset of other states, all the nodes with at least
as many zeros and ones must be visited and the states within the hash tables must be tested
to determine if qi is a superset. Within the tree T , these nodes are found inside an isosceles
triangle with equivalent sides n − r − s. Therefore, there are (n−r−s)(n−r−s+1)2 nodes that can
potentially be subsets of qi (including node ni). These nodes are illustrated in the dashed
triangle under the node ni in Figure 6.8.
n-r-sni
r
n-r-s
s
Figure 6.8: Finding supersets and subsets in the tree T
Chapter 6. Trace Reduction 113
1: Covers = ∅2: ordered cube = Order(qi)3: ni = Get tree node(ordered cube)4: Supset =get rectangle(ni)5: for all (nodes nj in Supset) do
6: for all (states qj in hash table of nj) do
7: if (qj ⊇ qi) then
8: Covers = Covers⋃
qj
9: end if
10: end for
11: end for
12: return Covers
Figure 6.9: Determine the states that are supersets of this state
As demonstrated through Figure 6.8, only the white nodes must be considered when search-
ing for subsets and supersets. Therefore, the number of comparisons required may be only a
fraction of the total number of existing states. In practice, this data structure is found to be
very efficient since the tree T is often not fully populated and the number of items in each hash
table is relatively small.
The procedure for finding the supersets (covers) of a given state qi is presented in Figure 6.9.
Lines 2-3 generate the ordered cube and find its location in the tree T . Line 4 gets all the
potential superset nodes by finding the nodes contained in the rectangle. The remaining lines
iterate through these nodes and test the states inside the hash tables to determine whether
they are supersets of qi. Note that testing whether a particular node is a superset or a subset of
another is a simple comparison procedure where the states must be identical over all positions
except where the superset is a don’t care. A procedure similar to that of Figure 6.9 is used to
find the subsets of qi where the get rectangle procedure is replace with get triangle as described
previously.
6.5 Experiments
In this section we demonstrate the effectiveness and performance benefits of the proposed trace
compaction approach. All experiments are conducted on a Sun Blade 1000 with a 750MHz Sparc
processor and 2.5GB of memory. Traces of length 50, 100, and 1000 are obtained via random
simulation for the circuits in the ISCAS’89 and ITC’99 benchmark suites. The reachability
analysis engine is developed using the all-solutions SAT solver of [82] which is a circuit variant
Chapter 6. Trace Reduction 114
of zChaff [69] and Grasp [73]. To evaluate the overall proposed approach we limit the number
of stored states to at most 10,000 state cubes and do not use an explicit timeout. Since the
compaction techniques of previous works [20, 78, 90] are not publicly available and due to the
fact that the assertions and errors used are unknown, we cannot directly or indirectly compare
with them.
2000
2500
3000
4000
4500
Run
time
(s)
3500
BFS DFS random proposedState selection methods
Figure 6.10: Comparison of state selection methods
We first evaluate the effectiveness of the state selection procedure described in Section 6.3.3.
We compare this heuristic against three other selection approaches, Depth-First Search (DFS),
Breadth-First Search (BFS), and random selection. The above techniques are used to perform
reachability analysis from a random state to the initial state given a timeout of 200 seconds.
The run-times over all the benchmarks are collected and presented in Figure 6.10. Both the
DFS and BFS methods result in run-times of over 4000 seconds, while the random method
fares better at over 3500 seconds. The proposed state selection strategy based on the smallest
hamming distance relative to the initial state and the position of the state in the graph G results
in run-times of just over 3000 seconds. This performance demonstrates that the proposed state
selection heuristics is an efficient overall reachability analysis procedure.
Next, we demonstrate the effectiveness of the overall proposed trace compaction approach.
Table 6.1 illustrates the results of the experiments on all ISCAS’89 and ITC’99 circuits for
traces of length 50, 100 and 1000. The first column shows the circuit names while the remaining
columns are organized into three sections based on their original trace length. The first column
of each section labeled org describes the original length of each trace (50, 100, or 1000). The
second column of each section labeled pre describes the length of the traces after performing the
single step pre-image process described in Section 6.3.2. We chose to find single step pre-images
Chapter 6. Trace Reduction 115
circuits org pre reach cpu pre cpu reach org pre reach cpu pre cpu reach org pre reach cpu pre cpu reach
s208.1 50 25 25 0.00 0.56 100 51 51 0.07 0.60 1000 244 244 0.08 9.26
s298 50 1 1 0.00 0.00 100 3 1 0.59 0.86 1000 1 1 0.34 0.01
s344 50 33 1 0.00 0.00 100 55 1 0.31 0.00 1000 10 5 0.42 0.08
s349 50 33 1 0.00 0.00 100 55 1 0.32 0.00 1000 10 5 0.39 0.08
s382 50 3 1 0.00 0.17 100 4 2 0.75 0.00 1000 1 1 0.89 0.00
s386 50 1 1 0.00 0.00 100 2 2 0.09 0.00 1000 2 2 0.06 0.00
s400 50 3 1 0.00 0.01 100 2 1 0.69 0.01 1000 2 1 0.74 0.05
s420.1 50 21 21 0.01 1.20 100 44 44 0.13 0.97 1000 505 505 0.14 25.85
s444 50 2 1 0.01 0.01 100 3 1 0.98 0.93 1000 1 1 0.67 0.01
s510 50 24 24 0.00 0.87 100 10 10 0.13 0.66 1000 25 25 0.12 0.56
s526 50 2 1 0.00 0.03 100 3 1 1.27 0.86 1000 1 1 1.09 0.03
s526n 50 2 1 0.00 0.03 100 3 1 1.26 0.86 1000 1 1 1.17 0.02
s641 50 3 3 0.00 1.65 100 4 4 1.81 2.10 1000 2 2 1.72 5.86
s713 50 3 3 0.00 1.65 100 4 4 1.80 2.01 1000 2 2 1.76 2.88
s820 50 1 1 0.00 0.00 100 1 1 0.00 0.00 1000 1 1 0.38 0.00
s832 50 1 1 0.00 0.00 100 1 1 0.00 0.00 1000 1 1 0.4 0.00
s838.1 50 26 26 0.00 1.87 100 45 45 0.26 2.07 1000 510 510 0.27 48.48
s953 50 6 5 0.00 1.38 100 1 1 2.52 0.00 1000 1 1 3.25 0.01
s1196 50 8 1 0.00 0.05 100 14 1 0.89 0.12 1000 5 1 1.11 0.03
s1238 50 8 1 0.01 0.05 100 14 1 0.84 0.11 1000 5 1 0.96 0.02
s1423 50 50 2 0.01 3.41 100 57 2 6.19 3.55 1000 15 3 6.24 67.61
s5378 50 50 50 0.04 0.89 100 100 100 23.76 1.03 1000 1000 1000 26.18 5.86
s9234.1 50 50 50 0.04 22.67 100 100 100 50.26 1.76 1000 1000 1000 49.89 11.55
s9234 50 34 34 0.02 1.67 100 36 36 46.99 1.66 1000 35 35 47.41 10.76
s13207.1 50 50 50 0.28 3.52 100 100 100 96.76 4.20 1000 1000 1000 105.92 7.61
s13207 50 50 50 0.23 3.29 100 100 100 91.57 4.17 1000 1000 1000 98.79 7.74
s15850.1 50 50 50 0.12 5.82 100 100 100 145.67 87.18 1000 1000 1000 140.31 9.01
s15850 50 50 50 0.07 3.45 100 100 100 96.18 4.19 1000 1000 1000 222.94 8.09
s38417 50 50 50 1.07 40.58 100 100 100 311.05 154.30 1000 1000 1000 340.83 25.74
s38584.1 50 50 50 1.27 11.83 100 100 100 336.97 12.37 1000 1000 1000 375.70 25.68
s38584 50 50 50 1.26 59.15 100 100 100 315.11 185.30 1000 1000 1000 344.44 23.85
b01 50 6 2 0.09 0.00 100 4 4 0.10 0.04 1000 4 4 0.9 0.04
b02 50 2 2 0.04 0.00 100 4 4 0.04 0.01 1000 4 4 0.4 0.01
b03 50 14 2 1.17 0.07 100 26 2 1.20 0.05 1000 8 8 1.32 13.54
b04 50 50 50 6.57 3.79 100 100 100 6.26 4.54 1000 1000 1000 6.89 27.88
b06 50 3 1 0.65 0.00 100 3 3 0.63 0.04 1000 2 2 0.62 0.03
b07 50 43 43 0.35 1.81 100 51 51 0.34 1.70 1000 56 56 0.28 14.56
b08 50 43 7 0.11 1.17 100 92 2 0.16 0.00 1000 329 5 0.16 0.02
b09 50 50 17 0.16 0.96 100 97 97 0.17 1.56 1000 82 82 0.18 18.70
b10 50 22 22 0.36 1.19 100 45 21 0.31 1.56 1000 32 32 0.59 9.56
b11 50 35 25 1.44 3.06 100 98 88 2.50 4.07 1000 550 550 1.92 27.68
b12 50 14 14 8.03 5.97 100 20 20 7.51 2.50 1000 36 36 7.89 34.85
b13 50 45 45 2.10 2.22 100 99 98 2.05 2.80 1000 1000 1000 2.57 19.54
b14 50 50 50 42.48 1.84 100 100 100 47.68 24.18 1000 1000 1000 52.92 3.93
b15 50 49 49 76.63 6.19 100 100 100 56.65 43.78 1000 87 87 52.11 230.41
Table 6.1: Results of proposed trace length compaction for traces of length 50, 100, 1000.
for no more than 50 states to achieve a balance between the number of pre-images found and
the time required to find them. The third column of each section labeled reach, presents the
length of the traces after applying the proposed reachability analysis method. As described
in section 6.3.2, it is most beneficial to first find the single step pre-images followed by the
Chapter 6. Trace Reduction 116
orginal size 50 orginal size 100 orginal size 1000
approach avg. reduced affected reduced avg. reduced affected reduced avg. reduced affected reducedpre 10.08 X 70 % 13.77 X 16.88 X 72 % 22.66 X 266.35 X 71 % 362.84 Xreach 3.81 X 37 % 8.54 X 6.10 X 35 % 15.36 X 2.77 X 15 % 12.40 Xcombined 19.67 X 74 % 25.72 X 36.21 X 72 % 49.01 X 327.76 X 72 % 446.59 X
Table 6.2: Summary of the results for the proposed trace length compaction approach
reachability analysis (reach) method. The fourth and fifth columns of each section, labeled cpu
pre and cpu reach respectively, present the run-times in seconds associated to the pre and reach
techniques.
Table 6.1 shows that the pre-image computation techniques help reduce the traces consid-
erably. For many circuits, the original trace length is first reduced greatly by the single step
pre-image (pre) technique and further reduced by the reachability analysis (reach). For exam-
ple, the trace for circuit s344 is first reduced from 50 to 33 using pre, and then again from 33
to 1 using reach.
Analyzing the results of Table 6.1, we notice that many traces are reduced to having a
single clock cycle (length of 1) or a very small trace size after applying reachability analysis.
This result can be partially attributed to the state selection heuristics of Section 6.3.3 and
the performance improvement techniques of Section 6.3.2. These techniques can increase the
number of “short-cuts” created through the graph G and likelihood that they will lead to the
initial state.
Table 6.2 summarizes the results in Table 6.1 by providing the average length compactions
(reductions) achieved by the different components of the proposed approach for traces of size 50,
100, and 1000. Similar to Table 6.1, the summaries are provided for each original trace length
separately. Column one presents the name of the compaction method: single step pre-image
computation (pre), reachability analysis (reach), or combined. For each trace length, the overall
average reduction is presented under the label avg. reduced. This field is calculated by adding
the reduction in size over all circuits divided over the number of circuits. Since not all circuit
traces are reduced by the proposed method, this number may not provide a good representation
of the average factor of reduction achieved. Instead, the columns labeled affected and reduced
show the percentage of traces that are affected by each approach and the amount by which
Chapter 6. Trace Reduction 117
they are reduced, respectively. For example, for traces of length 50, the proposed approaches
separately achieve 10.08 times and 3.81 times reductions while the combined approach reaches
19.67 times reductions. Furthermore, approximately 70% of the circuits are affected by the pre
techniques which results in an average reduction of 13.77 times. Similarly, the reach technique
and the combined approach affect 37% and 74% of traces for a reduction of 8.45 times and
25.72 times, respectively.
The experimental results demonstrate that not only is the proposed approach effective for
reducing traces, but it is also very efficient. For the majority of circuits in Table 6.1, compacted
traces are found within a few minutes. This performance reaffirms the practicality of the data
structure introduced in Section 6.4. The memory requirements of the overall approach are also
manageable since memory usage never exceeds 300MB when storing up to 10,000 state cubes.
The ability to quickly reduce traces in a memory efficient manner is crucial for making this
approach viable in real-life debugging environments.
6.6 Summary
This work proposed a novel trace reduction technique using SAT-based reachability analysis and
a set of state containment relationships. The components of the reachability analysis engine are
fine-tuned to increase the likelihood of generating short-cuts in the original trace. Furthermore,
a novel data structure is presented which stores visited states such that the state containment
relationships can be quickly applied. Experiments demonstrate the effectiveness of the proposed
techniques as approximately 75% of the traces are reduced by one or two orders of magnitude.
Chapter 7
Conclusion and Future Work
7.1 Summary of Contributions
As VLSI designs continue to increase in size and complexity, the verification and debugging
bottlenecks become more prominent. Since debugging is almost exclusively performed manually
in the industry today, the debugging burden will continue to increase as the complexity of
designs increase. To alleviate this overwhelming manual effort, automated debugging solutions
are required.
Research in automated debugging has shown great promise since the work in SAT-based
methodologies [92] was introduced almost six years ago. These powerful techniques, based on
formal technology, outperform traditional BDD and simulation-based diagnosis approaches by
orders of magnitude. Armed with such impressive achievements, researchers today are in a
quest for automated debugging solutions to industrial problems.
To achieve adoption by the VLSI industry, current debugging techniques must first overcome
the complexities introduced by large designs and their long error traces. These factors influence
the run-time performance and memory requirement of debuggers, which can be excessive for
real-life problems. For example, a relatively small design block of 100 thousand gates with a
corresponding error trace of 1000 clock cycles takes over 32GB of memory and may take weeks
to solve [81]. Practically, requiring over 32GB of memory may not be possible and requiring
more than a few hours dramatically reduces the value provided by the debugger.
118
Chapter 7. Conclusion and Future Work 119
Fortunately, the field of automated debugging using formal techniques is still in its infancy
and much improvements are possible. This dissertation presents such contributions that aim to
bridge the gap between current industrial demands and capabilities of debugging technologies.
• In Chapter 3, a debugging methodology using abstraction and refinement is introduced.
This methodology allows existing debuggers to cope with the complexity of large designs
by systematically partitioning the problem into successively larger and harder problems.
Abstraction is first applied by removing state elements or complex components from the
design under consideration. Next a debugger locates error sources within the abstracted
design. Under certain conditions, error sources can be missed because certain components
are removed due to abstraction. In this case, refinement re-introduces the required ab-
stracted elements back into the circuit. The iterative nature of abstraction and refinement
allows for large problems to be solved incrementally using less memory and faster run-
time. Experiments demonstrate that abstraction and refinement can improve debugging
performance by as much as two orders of magnitude while reducing memory requirements
to 10%, compared to a state-of-the-art debugger. The abstraction and refinement based
debugging methodology is published in [86] and [81].
• In Chapter 4, Bounded Model Debugging (BMD) is introduced to reduce the impact of
long error traces on the debugging problem. BMD is a methodology based on the insight
that errors are typically excited in close temporal proximity to the failure observation
point. Theory is developed and confirmed through statistical and empirical means. The
BMD methodology considers a subset of the error trace in order to debug the problem.
The debugging problem is enhanced with mechanisms to determine whether the solutions
are complete or whether some error sources may be missing. Such situations result in
increasing the length of the error trace subset under consideration and performing a
subsequent debugging process. The BMD methodology proposed is complete as it will
locate all error locations. Empirical evidence demonstrates the power of BMD as without
it only 34% of errors are found, while with BMD 93% of errors are found. Furthermore,
run-time improvements of up to two orders of magnitude are achieved with BMD.
Chapter 7. Conclusion and Future Work 120
• In Chapter 5, the first debugging formulation based on maximum satisfiability (max-
sat) is introduced. Here debugging is formulated as an unsatisfiable problem where the
erroneous circuit is incorrectly expected to implement the correct behaviour. A max-
sat solver reasons about the cause of the unsatisfiability and identifies clauses whose
removal make the problem become satisfiable. These clauses in turn correspond to error
sources in the erroneous circuit. Apart from providing an alternative formal debugging
formulation, max-sat solvers are effective at solving over-approximations of the debugging
problem. More specifically, they can quickly identify sets of unsatisfiable clauses. Using
this strength, a two step debugging framework is developed where a max-sat solver finds
coarse-grained solutions which are refined using a fine-grained state-of-the-art SAT-based
debugger. Experiments demonstrate that the two step approach provides an average
performance improvement of approximately 200 times. This work is published in [85].
• In Chapter 6, techniques are developed to reduce the length of the error traces independent
of the debugging approach used. Trace reduction techniques address one of the major
scaling challenges faced by debuggers today, the large size of error traces. Since the
number of clock cycles contained in a trace determines the size of the debugging problem,
reducing the error trace length can require much less memory by the debugger. The trace
reduction techniques presented in Chapter 6 use a novel reachability analysis that can
find pre-image states in a single iteration. In turn the pre-image states are used to deduce
relationships between visited states to establish a shorter trace. Along with efficient data
structures, the proposed algorithm is very effective as 75% of problems are reduced by
one to two orders of magnitude. This work is published in part in [84] and in [83].
7.2 Future Work
Much of the work presented in this dissertation addresses major challenges in automated debug-
ging through novel techniques. In general this thesis provides the basis, theory and empirical
evidence confirming the effectiveness of its contributions. More specifically, the work on abstrac-
tion and refinement, bounded model debugging and max-sat debugging are at their infancy and
Chapter 7. Conclusion and Future Work 121
there is promise of considerable improvements to be developed in the near future. In the next
section future work related to the contributions of this thesis are presented while in Section 7.2.2
discusses future research directions in automated debugging.
7.2.1 Extension of Contributions
Experiments demonstrate that abstraction and refinement may be one of the most effective
divide-and-conquer approaches for automated debugging. Since error excitation and observa-
tion points can be present across module boundaries or across countless circuit elements, basic
problem partitioning traditionally used in equivalence checking [30], model checking [15], syn-
thesis [89], and other CAD techniques cannot be applied. Some important remaining questions
pertaining to abstraction and refinement are how much abstraction to perform and which com-
ponents to abstract. We addressed one part of this question for function-based abstraction in
Chapter 3, however, we did not investigate this topic for state-based abstraction. For example
which state elements to abstract and what level of abstraction is optimal, remain open ques-
tions. Another direction of future research is the type of abstractions to perform apart from
state-based and function-based. For example, in model checking there exist many abstraction
techniques such as predicate abstraction, existential abstraction, universal abstraction, and
more [24, 40, 49] each with their strengths and weaknesses. Similarly, effective abstraction
techniques can be developed for debugging based on the circuit structure, its datapath and its
state machine. Such approaches should be more powerful than the basic techniques presented
in this thesis.
Bounded model debugging showed great promise in Chapter 4 as a systemic way of coping
with very long error traces. Experiments and statistical analysis demonstrate that for the
majority of problems, a short error trace suffices to find the error source. However, there
are particular situations where a short trace suffix is not adequate. For example, consider an
erroneous value written into memory but not read from memory until thousands of clock cycles
later. BMD cannot find the error source under this scenario until a trace including both the
writing and reading events is considered. Although, trace reduction may help with the above
scenario, improvements can be made to BMD to diagnose a trace window instead of a suffix.
Chapter 7. Conclusion and Future Work 122
One approach is to partition the trace in windows that can each be solved independently.
However, each window must also contain constraints corresponding to the expected/golden
values of the observed error signals. One challenge with the above proposal is that the size of
the constraints containing the correct signal values is exponential in nature. As clock cycles
prior to the observed failure are analyzed the constraint size can grow to be much larger than
the original problem. Efficient pre-image analysis techniques and approximations may be able
to reduce the severity of the exponential increase in size.
The max-sat debugging formulation proposed in this thesis is a natural alternative to SAT-
based debugging. Experiments demonstrate that max-sat fairs well for the over-approximate
debugging problem, while SAT is more efficient for solving the exact problem. One reason for the
difference in performance may be due to the relative immaturity of max-sat solvers. While SAT
solving algorithms have been actively improved over the last decade [27, 28, 70, 73, 74], relatively
little attention has been paid to improving max-sat algorithms. One reason is the lack of
industrial interest in max-sat which results in the scarceness of industrial max-sat applications.
The debugging problems presented in this thesis are the first industrial applications of max-
sat from the CAD for VLSI community and were included in the third max-sat competition
in 2008 [4]. With new interest and problems developed using max-sat, improvements will be
made to the max-sat solvers. Following the trend set by SAT over the past decade, max-sat
algorithms may experience order of magnitude improvements within the next few years. The
future directions of research related to debugging exist both in formulating different types of
max-sat problems as well as developing efficient max-sat algorithms.
7.2.2 Future Directions
Formal techniques have given new life to automated debugging in the past decade [92]. Whereas
industrial applications were few and limited ten years ago, there is now promise for broad
applications across the VLSI design world. To achieve such as an ambitious goal further research
is needed in the following areas:
• Encompassing debugging techniques. Incorporating existing individually effective debug-
ging techniques into a efficient and robust debugging tool.
Chapter 7. Conclusion and Future Work 123
• High level reasoning engines. Migrating from Boolean level reasoning engines such as SAT
and max-sat solvers to high level reasoning engines such as SMT solvers.
• Verification environment debugging. Broadening the scope of debugging to include local-
izing errors in the stimulus generators, checkers and assertions.
Firstly, it is clear that techniques presented in Chapters 3 and 4 as well as others must be
combined to create in an effective and robust debugging tool. One challenge at the integration
level is how to combine techniques such as abstraction and refinement, bounded model debug-
ging, hierarchical debugging [3], and memory debugging [52]. An all encompassing debugging
framework is needed to intelligently determine which techniques to apply and in what manner
to apply them to different problems. Since many of the individual techniques are iterative
by nature, heuristics are needed to identify appropriate situations to dispatch each. This re-
search mirrors advances made in the synthesis domain where effective individual optimization
techniques are combined to generate powerful and robust synthesis tools such as SIS [89], MV-
SIS [37] and Synopsys design complier [7]. Once an all encompassing debugging framework is
developed, more targeted approaches can be created for specific corner cases.
A second direction for future research in automated debugging is the migration from Boolean
level to higher level reasoning engines. Most advanced debugging techniques rely on SAT, QBF,
and max-sat solvers, where high level problems are first converted to Boolean level problems
and then solved. The above approach is also referred to as bit-blasting. Since most problems
come from the RTL or higher level models, there is bus, module, instantiation, and structural
information that is lost during bit-blasting. Furthermore, simple arithmetic functions such
as addition and multiplication can be very hard to reason about at the Boolean level. High
level decision engines such as Satisfiability Modulo Theories (SMT) solvers [14, 64]. can reason
about many theories such as linear real arithmetic, uninterpreted functions, arrays, lists and
bit vectors. For each theory the most effective solver is used until the entire problem is solved.
For example, for Boolean problems, a SAT solver is often employed. As SMT solvers gain
momentum in the research domain, their improved performance will generate interest from
other CAD for VLSI application domains. Debugging is an application that can greatly benefit
Chapter 7. Conclusion and Future Work 124
from improvements in SMT. Research dedicated to SMT debugging formulations or to the
development of specific theories for debugging promise to be a fruitful area.
In this thesis, automated debuggers exclusively analyze the design for error sources. This
analysis domain assumes that the verification environment composed of test pattern generators,
assertions, and testbench checkers are all correct (error free). In reality errors are as likely to
stem from the verification environment as they are to stem from the design. Although design
debugging techniques can also help identify bugs in the testbench by providing hints about
primary inputs or outputs that are identified as suspects, the vast majority of testbench error
sources cannot be localized. A promising research direction is that of debugging not only the
design, but the overall verification environment as well. As a simple case consider when an
assertion fails. In this case, there can be a combination of three general error locations (i) the
testbench module generating the stimulus patterns, (ii) the design, (iii) the assertion (checker).
Debugging techniques can focus on parts or combination of these locations. In practice, there
may exist parsing/integration challenges related to debugging the verification environment, as
some components may be written in procedural code (C/C++), while others are implemented
in behavioural or structural RTL (Verilog). In general, any debugging information reported
about the verification environment will provide much value to the verification engineer.
7.3 Closing Remarks
The verification and debugging efforts are eclipsing most other VLSI design task today. The
bottleneck due to debugging can be partially alleviated by the use of efficient, practical and
robust automated debugging techniques. Such techniques are still at their infancy as their adop-
tion by the industry is challenged by their ability to handle large problem instances. The work
presented in this dissertation represents some of the most powerful techniques developed to
date to help bridge the gap between current debugging capabilities and demands by industrial
applications. With techniques such as abstraction and refinement, bounded model debugging,
max-sat formulations, and trace reduction, some of the major obstacles are significantly re-
duced. Additionally, the contributions presented here show great promise about the future of
Chapter 7. Conclusion and Future Work 125
automated debugging. As outlined in this chapter there are many areas within debugging that
have the potential to provide significant improvements. With research and industry support,
automated debugging tools can become as common as today’s popular functional simulators to
verification engineers.
Bibliography
[1] M. Abramovici, M. Breuer, and A. Friedman, Digital Systems Testing and Testable De-
sign. Computer Science Press, 1990.
[2] M. Abramovici, P. R. Menon, and D. T. Miller, “Critical path tracing - an alternative to
fault simulation,” in DAC ’83: Proceedings of the 20th conference on Design automation,
1983, pp. 214–220.
[3] M. F. Ali, S. Safarpour, A. Veneris, M. Abadir, and R. Drechsler, “Post-verification
debugging of hierarchical designs,” in Int’l Conf. on CAD, 2005, pp. 871–876.
[4] J. Argelich, C. Li, F. Manya, and J. Planes, “Max-SAT 2008 - Third Max-SAT Evalua-
tion,” 2008, http://www.maxsat.udl.cat/08.
[5] L. Bening and H. Foster, Principles of Verifiable RTL Design. Kluwer Academic Pub-
lishers, 2001.
[6] J. Bergeron, Writing Testbenches: Functional Verification of HDL Models. Kluwer Aca-
demic Publishers, 2003.
[7] H. Bhatnagar, Advanced ASIC Chip Synthesis Using Synopsys Design Compiler Physical
Compiler and PrimeTime. Kluwer Academic Publishers, 2002.
[8] A. Biere, A. Cimatti, E. Clarke, O. Strichman, and Y. Zhu, “Bounded model checking,”
in Advances In Computers, 2003.
[9] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu, “Symbolic model checking without BDDs,”
126
BIBLIOGRAPHY 127
in Tools and Algorithms for the Construction and Analysis of Systems, ser. LNCS, vol.
1579. Springer Verlag, 1999, pp. 193–207.
[10] P. Bjesse and A. Boralv, “DAG-aware circuit compression for formal verification,” in Int’l
Conf. on CAD, 2004, pp. 42–49.
[11] P. Bjesse and J. Kukula, “Using counter example guided abstraction refinement to find
complex bugs,” in Design, Automation and Test in Europe, 2004, pp. 156–161.
[12] V. Boppana and W. K. Fuchs, “Dynamic faults collapsing and diagnostic test pattern
generation for sequential circuits,” in Int’l Conf. on CAD, 1998, pp. 147–154.
[13] R. Brayton, G. Hachtel, C. McMullen, and A. Sangiovanni-Vincentelli, Logic Minimiza-
tion Algorithms for VLSI Synthesis. Kluwer Academic Publishers, 1984.
[14] R. Bruttomesso, A. Cimatti, A. Franzen, A. Griggio, and R. Sebastiani, “The mathsat 4
SMT solver,” in Computer Aided Verification, 2008, pp. 299–303.
[15] J. Burch, E. Clarke, and D. Long, “Symbolic model checking with partitioned transition
relations,” in Int’l Conference on Very Large Scale Integration, 1991.
[16] J. Burch, E. Clarke, K. McMillan, and D. Dill, “Sequential circuit verification using
symbolic model checking,” in Design Automation Conf., 1990, pp. 46–51.
[17] K. Chang, V. Bertacco, and I. Markov, “Simulation-based bug trace minimization with
BMC-based refinement,” in Int’l Conf. on CAD, 2005, pp. 1045–1051.
[18] K.-H. Chang, I. Markov, and V. Bertacco, “Automating post-silicon debugging and re-
pair,” IEEE Trans. on Comp., p. to appear, 2008.
[19] P. Chauhan, E. M. Clarke, J. H. Kukula, S. Sapra, H. Veith, and D. Wang, “Automated
abstraction refinement for model checking large state spaces using sat based conflict anal-
ysis,” in Int’l Conf. on Formal Methods in CAD, 2002, pp. 33–51.
[20] Y. Chen and F. Chen, “Algorithms for compacting error traces,” in ASP Design Automa-
tion Conf., 2003, pp. 99–103.
BIBLIOGRAPHY 128
[21] E. Clarke, A. Biere, R. Raimi, and Y. Zhu, “Bounded model checking using satisfiability
solving,” Formal Methods in System Design: An International Journal, vol. 19, no. 1, pp.
7–34, 2001.
[22] E. Clarke, O. Grumberg, and D. Long, “Model checking and abstraction,” in Symposium
on Principles of Programming Languages, 1992, pp. 342–354.
[23] E. Clarke, A. Gupta, and O. Strichman, “SAT-based counterexample-guided abstraction
refinement,” IEEE Trans. on CAD, vol. 22, no. 7, pp. 1113–1123, 2004.
[24] E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith, “Counterexample-guided abstrac-
tion refinement for symbolic model checking,” Journal of the ACM, vol. 50, no. 5, pp.
752–794, 2003.
[25] S. Cook, “The complexity of theorem proving procedures,” in 3rd Annual ACM Sympo-
sium on Theory of Computing, 1971, pp. 151–158.
[26] T. Cormen, C. Leierson, and R. Rivest, Introduction to Algorithms. MIT Press, McGraw-
Hill Book Company, 1990.
[27] M. Davis, G. Logemann, and D. Loveland, “A machine program for theorem proving,”
Comm. of the ACM, vol. 5, pp. 394–397, 1962.
[28] M. Davis and H. Putnam, “A computing procedure for quantification theory,” Journal of
the ACM, vol. 7, pp. 506–521, 1960.
[29] N. Dershowitz, Z. Hanna, and J. Katz, “Bounded model checking with QBF,” in Int’l
Conf. on Theory and Applications of Satisfiability Testing, 2005, pp. 408–414.
[30] R. Drechsler and S. Horeth, “Gatecomp: Equivalence checking of digital circuits in an
industrial environment,” in Int’l Workshop on Boolean Problems, 2002, pp. 195–200.
[31] EETimes.com, “Faster Verification is the goal at ST,” 2007,
http://www.eetimes.com/news/design/showArticle.jhtml?
articleID=197700622&pgno=3.
BIBLIOGRAPHY 129
[32] ElectronicsWeekly.com, “Leakage and verification costs both continue to rise, says Ca-
dence,” 2008, http://www.electronicsweekly.com/Articles/2008/10/23/44769/
leakage-and-verification-costs-both-continue-to-rise-says-cadence.htm.
[33] M. Fahim Ali, A. Veneris, S. Safarpour, R. Drechsler, A. Smith, and M.S.Abadir, “De-
bugging sequential circuits using Boolean satisfiability,” in Int’l Conf. on CAD, 2004, pp.
204–209.
[34] F. Fallah, “Coverage directed validation of hardware models,” Ph.D. dissertation, MIT,
1999.
[35] G. Fey, S. Safarpour, A. Veneris, and R. Drechsler, “On the relation between simulation-
based and SAT-based diagnosis,” in Design, Automation and Test in Europe, 2006, pp.
1139–1144.
[36] H. Foster, A. Krolnik, and D. Lacey, Assertion-Based Design. Kluwer Academic Pub-
lishers, 2003.
[37] M. Gao, J. Jiang, Y. Jiang, Y. Li, S. Sinha, and R. Brayton, “MVSIS,” in Int’l Workshop
on Logic Synth., 2001.
[38] E. Goldberg, M. Prasad, and R. Brayton, “Using SAT for combinational equivalence
checking,” in Int’l Workshop on Logic Synth., 2000, pp. 185–191.
[39] ——, “Using SAT for combinational equivalence checking,” in Design, Automation and
Test in Europe, 2001, pp. 114–121.
[40] S. Graf and H. Saidi, “Construction of abstract state graphs with PVS,” in Computer
Aided Verification. Springer-Verlag, 1997, pp. 72–83.
[41] F. Heras, J. Larrosa, and A. Oliveras, “MiniMaxSat: A new weighted max-sat solver,” in
Int’l Conf. on Theory and Applications of Satisfiability Testing, 2007, pp. 41–55.
[42] E. A. Hirsch, “Sat local search algorithms: Worst-case study,” Journal of Automated
Reasoning, vol. 24, no. 1-2, pp. 127–143, 2000.
BIBLIOGRAPHY 130
[43] S. Huang and K. Cheng, Formal Equivalence Checking and Design Debugging. Kluwer
Academic Publisher, 1998.
[44] S.-Y. Huang and K.-T. Cheng, “Errortracer: Design error diagnosis based on fault simu-
lation techniques,” IEEE Trans. on CAD, vol. 18, no. 9, pp. 1341–1352, 1999.
[45] S.-Y. Huang, “A fading algorithm for sequential fault diagnosis,” in DFT ’04: Proceedings
of the Defect and Fault Tolerance in VLSI Systems, 19th IEEE International Symposium
on (DFT’04), 2004, pp. 139–147.
[46] L. Huisman, “Diagnosing arbitrary defects in logic designs using single location at a time
(SLAT),” IEEE Trans. on CAD, vol. 23, no. 1, pp. 91–101, 2004.
[47] Intel Corp., “The Evolution of a Revolution,” 2008,
http://download.intel.com/pressroom/kits/IntelProcessorHistory.pdf.
[48] International Techonology Roadmap for Semiconductors, “ITRS 2006 Update,” 2008,
http://www.itrs.net/Links/2006Update/2006UpdateFinal.htm.
[49] H. Jain, D. Kroening, N. Sharygina, and E. Clarke, “Word level predicate abstraction
and refinement for verifying rtl verilog,” in Design Automation Conf. ACM, 2005, pp.
445–450.
[50] N. Jha and S. Gupta, Testing of Digital Systems. Cambridge University Press, 2003.
[51] H.-J. Kang and I.-C. Park, “SAT-based unbounded symbolic model checking,” IEEE
Trans. on CAD, vol. 24, no. 2, pp. 129–140, 2005.
[52] B. Keng, H. Mangassarian, and A. Veneris, “A succinct memory model for automated
design debugging,” in Int’l Conf. on CAD, 2008, pp. 137–142.
[53] T. Kropf, Introduction to Formal Hardware Verification. Springer, 1999.
[54] A. Kuehlmann, V. Paruthi, F. Krohm, and M. Ganai, “Robust Boolean reasoning for
equivalence checking and functional property verification,” IEEE Trans. on CAD, vol. 21,
no. 12, pp. 1377–1394, 2002.
BIBLIOGRAPHY 131
[55] G. M. L. Lavagno, L. Scheffer, EDA for IC Implementation, Circuit Design, and Pro-
cess Technology (Electronic Design Automation for Integrated Circuits Handbook). CRC
Press, 2006.
[56] T. Larrabee, “Test pattern generation using Boolean satisfiability,” IEEE Trans. on CAD,
vol. 11, pp. 4–15, 1992.
[57] C. Lee, “Representation of switching circuits by binary decision diagrams,” Bell System
Technical Jour., vol. 38, pp. 985–999, 1959.
[58] T. Lee, W. Chuang, I. Hajj, and W. Fuchs, “Circuit-level dictionaries of CMOS bridging
faults,” in VLSI Test Symp., 1994, pp. 386–391.
[59] B. Li, M. Hsiao, and S. Sheng, “A novel SAT all-solutions solver for efficient preimage
computation,” in Design, Automation and Test in Europe, 2004, pp. 272–277.
[60] M. Liffiton and K. A. Sakallah, “On Finding All Minimally Unsatisfiable Subformulas,”
in Int’l Conf. on Theory and Applications of Satisfiability Testing, 2005, pp. 32–43.
[61] M. Liffiton and K. Sakallah, “Algorithms for computing minimal unsatisfiable subsets of
constraints,” Journal of Automated Reasoning, vol. 40, no. 1, pp. 1–33, 2008.
[62] J. Liu and A. Veneris, “Imcremental fault diagnosis,” IEEE Trans. on CAD, vol. 24,
no. 2, pp. 240–251, 2005.
[63] F. Lu, L.-C. Wang, K.-T. Cheng, and R. Huang, “A circuit SAT solver with signal
correlation guided learning,” in Design, Automation and Test in Europe, 2003, pp. 892–
897.
[64] C. Lynch and Y. Tang, “Interpolants for linear arithmetic in SMT,” in Automated Tech-
nology for Verification and Analysis, 2008, pp. 156–170.
[65] F. Y. Mang and P.-H. Ho, “Abstraction refinement by controllability and cooperativeness
analysis,” in Design Automation Conf. ACM, 2004, pp. 224–229.
BIBLIOGRAPHY 132
[66] H. Mangassarian, A. Veneris, S. Safarpour, M. Benedetti, and D. Smith, “A performance-
driven qbf-based iterative logic array representation with applications to verification, de-
bug and test,” in Int’l Conf. on CAD, 2007, pp. 240–245.
[67] H. Mangassarian, A. Veneris, S. Safarpour, F. N. Najm, and M. S. Abadir, “Maximum
circuit activity estimation using pseudo-boolean satisfiability,” in Design, Automation
and Test in Europe, 2007, pp. 1538–1543.
[68] J. Marques-Silva and J. Planes, “Algorithms for maximum satisfiability using unsatisfiable
cores,” in Design, Automation and Test in Europe, 2008, pp. 6–10.
[69] J. Marques-Silva and K. Sakallah, “GRASP – a new search algorithm for satisfiability,”
in Int’l Conf. on CAD, 1996, pp. 220–227.
[70] J. Marques-Silva and K. Sakallah, “GRASP: A search algorithm for propositional satisfi-
ability,” IEEE Trans. on Comp., vol. 48, no. 5, pp. 506–521, 1999.
[71] K. McMillan, “Applying SAT methods in unbounded symbolic model checking.” in Com-
puter Aided Verification, 2002, pp. 250–264.
[72] G. Moore, “Cramming more components onto integrated circuits,” electronics, vol. 38,
no. 8, pp. 1–4, 1965.
[73] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, “Chaff: Engineering an
efficient SAT solver,” in Design Automation Conf., 2001, pp. 530–535.
[74] N. S. N. Een, “An Extensible SAT-solver,” in Int’l Conf. on Theory and Applications of
Satisfiability Testing, 2003, pp. 333–336.
[75] G.-J. Nam, K. Sakallah, and R. Rutenbar, “A new fpga detailed routing approach via
search-based boolean satisfiability,” IEEE Trans. on CAD, vol. 21, no. 6, pp. 674–684,
2002.
[76] ——, “A new FPGA detailed routing approach via search-based Booleansatisfiability,”
IEEE Trans. on CAD, vol. 21, no. 6, pp. 674–684, 2002.
BIBLIOGRAPHY 133
[77] OpenCores.org, 2008, http://www.opencores.org.
[78] S.-J. Pan, K.-T. Cheng, J. Moondanos, and Z. Hanna, “Generation of shorter sequences
for high resolution error diagnosis using sequential sat,” in ASP Design Automation Conf.,
2006, pp. 25–29.
[79] D. Plaisted and S. Greenbaum, “A structure-preserving clause form translation,” J. Symb.
Comput., vol. 2, no. 3, pp. 293–304, 1986.
[80] P. Rashinkar, P. Paterson, and L. Singh, System-on-a-chip Verification: Methodology and
Techniques. Kluwer Academic Publisher, 2000.
[81] S. Safarpour and A. Veneris, “Automated design debugging with abstraction and refine-
ment,” IEEE Trans. on CAD, 2009, under review.
[82] S. Safarpour, A. Veneris, and R. Drechsler, “Integrating observability don’t cares in all-
solution SAT solvers,” in IEEE International Symposium on Circuits and Systems, 2006,
pp. 1587–1590.
[83] ——, “Improved SAT-based reachability analysis with observability don’t cares,” Journal
on Satisfiability, Boolean Modeling and Computation, vol. 5, pp. 1–25, 2008.
[84] S. Safarpour, A. Veneris, and H. Mangassarian, “Trace compaction using SAT-based
reachability analysis,” in ASP Design Automation Conf., 2007, pp. 932–937.
[85] S. Safarpour, M. H. Liffiton, H. Mangassarian, A. Veneris, and K. A. Sakallah, “Improved
design debugging using maximum satisfiability,” in Int’l Conf. on Formal Methods in
CAD, 2007, pp. 13–19.
[86] S. Safarpour and A. Veneris, “Abstraction and refinement techniques in automated design
debugging,” in Design, Automation and Test in Europe, 2007, pp. 1182–1187.
[87] S. Safarpour, A. Veneris, G. Baeckler, and R. Yuan, “Efficient SAT-based Boolean match-
ing for FPGA technology mapping,” in Design Automation Conf., 2006, pp. 466–471.
BIBLIOGRAPHY 134
[88] S. Sahni and A. Bhatt, “The complexity of design automation problems,” in Design
Automation Conf., 1980, pp. 402–411.
[89] E. Sentovich, K. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj,
P. Stephan, R. Brayton, and A. Sangiovanni-Vincentelli, “SIS: A system for sequential
circuit synthesis,” University of Berkeley, Tech. Rep., 1992.
[90] S. Shen, Y. Qin, and S. Li, “A faster counterexample minimization algorithm based on
refutation analysis,” in Design, Automation and Test in Europe, 2005, pp. 672–677.
[91] A. Smith, A. Veneris, M. F. Ali, and A. Viglas, “Fault diagnosis and logic debugging
using Boolean satisfiability,” IEEE Trans. on CAD, vol. 24, no. 10, pp. 1606–1621, 2005.
[92] A. Smith, A. Veneris, and A. Viglas, “Design diagnosis using Boolean satisfiability,” in
ASP Design Automation Conf., 2004, pp. 218–223.
[93] F. Somenzi, “Efficient manipulation of decision diagrams,” Software Tools for Technology
Transfer, vol. 3, no. 2, pp. 171–181, 2001.
[94] D. Stoffel and W. Kunz, “Record & play: a structural fixed point iteration for sequential
circuit verification,” in Int’l Conf. on CAD, 1997, pp. 394–399.
[95] O. Strichman, “Pruning techniques for the sat-based bounded model checking problem.”
in CHARME, 2001, pp. 58–70.
[96] A. Suelflow, G. Fey, R. Bloem, and R. Drechsler, “Using unsatisfiable cores to debug
multiple design errors,” in Great Lakes Symp. VLSI, 2008, pp. 77–82.
[97] G. S. Tseitin, “On the complexity of derivation in propositional calculus,” in Studies in
Constructive Mathematics and Mathematical Logic, Part II, 1968, pp. 115–125.
[98] A. Veneris and M. Abadir, “Design rewiring using ATPG,” IEEE Trans. on CAD, vol. 21,
no. 12, pp. 1469–1479, 2002.
[99] A. Veneris and I. N. Hajj, “Design error diagnosis and correction via test vector simula-
tion,” IEEE Trans. on CAD, vol. 18, no. 12, pp. 1803–1816, 1999.
BIBLIOGRAPHY 135
[100] F. Wotawa and M. Nica, “Record & play: a structural fixed point iteration for sequential
circuit verification,” in International Symposium on Intelligent and Distributed Comput-
ing, 2007, pp. 1–10.
[101] Y.-S. Yang, A. Veneris, and N. Nicolici, “Automated data analysis solutions to silicon
debug,” in Design, Automation and Test in Europe, 2009, to appear in April 2009.
[102] Q. Zhu, N. Kitchen, A. Kuehlmann, and A. Sangiovanni-Vincentelli, “Sat sweeping with
local observability don’t-cares,” in Design Automation Conf., 2006, pp. 229–234.