estimation of power dissipation in cmos combinational circuits using boolean function manipulation

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN. VOL. I I . NO. 3. MARCH 1992 373

Estimation of Power Dissipation in CMOS Combinational Circuits Using Boolean

Function Manipulation Srinivas Devadas, Member, IEEE, Kurt Keutzer, Member, IEEE, and Jacob White, Associate Member, IEEE

Abstract-Estimating maximum power dissipation for a CMOS logic network is difficult because the power dissipated by the network is typically a strong function of the network’s inputs. This implies that the number of simulations which must be performed in order to find the maximum power dissipation is exponential in the number of inputs to the network. In this paper we show that a simplified model of power dissipation re- lates maximizing dissipation to maximizing gate output activity, appropriately weighted to account for differing load capacitances. To find the input or input sequence that maximizes the weighted activity, we give algorithms for transforming the problem to a weighted mux-satisfiability problem, and then present exact and approximate algorithms for solving weighted max-satisfiability. Algorithms for constructing the max-satisfiability problem for both dynamic and static CMOS, where for the latter dissipation caused by glitching is considered, are presented. Also, we present efficient exact and approximate methods for solving weighted max-satisfiability and show that these methods are viable for large-scale problems through examina- tion of experimental results.

1. INTRODUCTION HE high transistor density now possible with CMOS T integrated circuits, together with growing importance

of reliability as a design issue, has made estimating worst- case power dissipation in logic circuits an important problem (a similar problem, peak current estimation, has received considerable recent attention [3]). In general, power dissipation in a logic circuit is a complex function of propagation delays, clock frequency, technology pa- rameters, circuit topology, and, in the case of CMOS, the input vector or vector sequence applied. This last aspect makes accurate estimation of worst-case power dissipation in CMOS networks extremely difficult, since the number of input sequences that have to be simulated in order to find the sequence that produces the maximum power dissipation is exponential in the number of inputs to the circuit.

In this paper, we address the problem of estimation of

Manuscript received June 8, 1990; revised September 28, 1990. This work was supported by the Defense Advanced Research Projects Agency under Contract N00014-87-K-0825 and by grants from the Digital Equip- ment Corporation and Analog Devices. This paper was recommended by Associate Editor R. E. Bryant.

S . Devadas and J . White are with the Department of Electrical Engi- neering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139.

K. Keutzer is with Synopsys, Inc., Mountain View, CA 94043. IEEE Log Number 9104228.

worst-case power dissipation in dynamic and static CMOS combinational circuits. Our general approach is to use a simple, but reasonable, model of how CMOS gates dissipate power to relate maximizing dissipation to maximizing gate output activity, appropriately weighted by differing load capacitances. Then to find the input vector or input sequence that maximizes the weighted activity, we first give algorithms for transforming the problem to a weighted max-satisjability problem. That is, algorithms are presented which convert a logic description into a multiple-output Boolean function of the input vector or vector sequence, where each output of the Boolean function is associated with a logic gate output transition. It then follows that an assignment to the input vector or vector sequence which results in a maximum weighted number of these function’s outputs becoming 1 corresponds to the input vector or vector sequence causing maximum weighted activity. Once this input vector or vector sequence has been determined, circuit-level simulation can be used to accurately determine the associated power dissipation. Finally, we give exact and approximate methods for solving the weighted max-satisfiability problem thereby generated.

Algorithms for constructing the multiple-output Bool- ean function for transitions in dynamic CMOS gates are simplest, because dynamic gates reset at the beginning of every clock cycle. Determining the input pattern that maximizes power dissipation in static CMOS circuits is considerably more difficult, stemming from the fact that activity caused by an input vector is dependent on the circuit node values just before the input is applied, and is therefore also a function of the previous input vector. Thus, it is necessary to search for a two-vector input sequence that produces maximum power dissipation. Fur- ther, in contrast to dynamic circuits, static logic gates can glitch and make multiple transitions. The number of transitions a gate can make is a function of the arrival times of transitions on its inputs, which in turn is dependent on the propagation delays of gates on the different paths from the primary inputs. We give transformational algorithms, under various delay models, for static CMOS circuits that produce a multiple-output Boolean function with the property mentioned above.

Although the max-satisfiability problem has been shown

0278-0070/92$03.00 0 1992 IEEE

~

314 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. I I . NO. 3, MARCH 1992

to be NP-complete [ 5 ] , for the Boolean equations associated with peak power estimation, exact solutions can be found in a reasonable period of time for networks with up to a few hundred gates. We compare two approaches to finding exact solutions, one based on a branch-and-bound algorithm with associated pruning strategies and the other based on the observation that max-satisfiability can be transformed into the problem of generating a disjoint cover for a multiple-output logic function. The latter approach can make use of efficient disjoint cover enumeration and graph manipulation algorithms proposed in the past. For circuits with more than a few hundred gates, both the branch-and-bound and the disjoint cover enumeration approach to exactly solving max-satisfiability are too com- putationally expensive. We give approximate transformation algorithms which simplify the Boolean function in such a way that it is easier to find a max-satisfying assignment but for which power dissipation is not under- estimated.

A recent paper on maximum current estimation [4] pre- sents some similar ideas. Simplified models are used, and a branch-and-bound strategy is used to determine the input vector that produces maximum current flow. Our construction of the optimization problem for maximum power is, however, quite different. In addition, for static CMOS circuits, we consider the problem of glitching, which can cause a gate to make multiple transitions (and therefore dissipate more power) on the application of an input vector pair. Another important difference is that the heuristics in [4] can underestimate the maximum current, whereas our approximation strategies provide upper bounds on the maximum power dissipation.

We begin in the next section by briefly describing the physical model used for power dissipation. Transforma- tions used for dynamic CMOS circuits are presented in Section 111. Algorithms for two-level and multilevel static CMOS circuits under a unit-delay and a more general delay model are described in Section IV. Solving max-satisfiability is the subject of Section V, where we also show how the transformations can be approximated. Prelimi- nary experimental results are given in Section VI. In Sec- tion VI1 we discuss the limitations for our current techniques and provide directions for future work.

11. A POWER DISSIPATION MODEL As mentioned above, CMOS logic circuits only dissi-

pate energy when their node voltages are changing, and this suggests that computing power dissipation involves finding the circuit’s transient response, The straightfor- ward approach to determining the maximum power dissipation of a CMOS combinational network is to simulate the network for all possible sets of input vectors and then to multiply the highest energy dissipation so derived by the maximum rate, typically the clock frequency, at which the input vectors to the network can change.

An exhaustive simulation of all input patterns is com- putationally infeasible for a network with more than a few

inputs. Hence, we derive a simplified model of the energy dissipation that can be used, along with Boolean manipulation techniques we discuss in later sections, to efficiently determine the dissipation-maximizing input pat- tem(s). Then, detailed simulation can be used to more accurately determine the actual dissipation. A very simple relation between the logical behavior of a CMOS combinational network and the energy that circuits dissipate can be derived based on three simplifying assumptions:

The only capacitance in a CMOS logic gate is at the output node of the gate. Either current is flowing through some path from VDD to the output capacitor, or current is flowing from the output capacitor to ground. Any change in a logic-gate output voltage is a change from VDD to ground or vice versa.

All of these are reasonably accurate assumptions for well- designed CMOS gates [6], and when combined imply that the energy dissipated by a CMOS gate is given by

where C is the output capacitance for the gate and NG is the total number of gate output transitions, from either low to high or high to low. When a logic gate glitches, it is not necessarily the case that the output falls to ground or rises to VDD, so ( 1 ) represents the worst-case dissipation.

As can be seen from ( l ) , with our simplifying assumptions, maximizing the energy dissipated by a CMOS combinational network over the possible input vector sets involves maximizing the weighted sum of logic-gate output transitions, where the weights are given by the gate output capacitances. We will use this result in later sections to show that finding the input pattern that maximizes dissipation can be derived by examining an associated system of Boolean equations.

111. DYNAMIC CMOS CIRCUITS Finding the inputs to a combinational logic network that

maximize the dissipation in one clock cycle is easiest for networks of dynamic logic gates. This is because at the beginning of a clock cycle all dynamic gate outputs are forced to either a logic 1 or 0, depending only on the type of gate, not on the inputs. In addition, the gate outputs can change at most once during the remainder of the clock cycle, based on the logical function being performed and the input values only during that clock cycle.

Interconnections of N-type dynamic gates may suffer from race conditions that cause erroneous logical behavior, but correctly designed circuits will have the property that in one clock cycle all the gate outputs will be precharged high, and depending on the input vector, a subset of the gates in the network will undergo a single falling transition. This property implies that to find the input patterns which maximize dissipation we need only find a single input vector to the logic network that maximizes the

DEVADAS et al. : ESTIMATION OF POWER DISSIPATION 375

weighted sum of falling transitions, where, as mentioned in Section 11, the weights are determined by the load capacitances. An analogous statement holds for P-type dynamic gates.

If all the gates are N-type and all have the same load capacitances, the problem simplifies to finding the input vector which maximizes the number of falling transitions. To determine this vector, we simply find the OFF set’ for each gate output in the network, including internal gate outputs, and find the input vector which is in the maximum number of gate output OFF sets. Problems of this form are usually referred to as max-satisfiability problems. If weights are introduced, to model the varying load capacitances, we will refer to the resulting problem as a weighted max-satisfiability problem. Exact and approximate methods of finding such an input vector are the subject of Section V.

IV. STATIC CMOS CIRCUITS The problem of determining the maximum dissipation

in static CMOS circuits is more complicated than for dynamic circuits, mostly because static gate output nodes are not “reset.” This means that the transitions a static gate undergoes depend not only on the present input vector, but also on the present circuit state, which is determined by the previous input vector. Thus, we must search for a two-vector sequence that maximizes dissipation or, equivalently, the weighted sum of transitions. Note that this is a simplification that overestimates the dissipation. Ignored is the fact that, in order to repeat the same two- vector sequence, the network’s inputs must be returned from the second vector in the sequence to the first vector, and this reversed sequence may not provide as much dissipation.

Allowing for this overestimation, we wish to construct a multiple-output Boolean function which depends on the two-vector input sequence such that a maximum weighted satisfying assignment for the function causes maximum weighted activity in the circuit. The outputs of this function will correspond to gates in the original network. Con- structing this Boolean function is complicated for general static CMOS logical networks because gates in these networks can glitch. That is, changing the input vector once may cause multiple transitions to occur on a particular gate’s output node, and this can contribute substantially to dissipation. As this glitching phenomenon must be related to differing signal arrival times at a gate’s inputs, glitching occurs only in two-level networks with varied gate delays or in multilevel networks, both of which are examined below.

A . Two-Level Networks Two-level, or sum-of-product, networks are logic net-

works in which the primary inputs feed AND gates whose

‘An OFF set is the subset of all possible input vectors to the logic network that produces a logic 0 at the node of interest.

outputs are inputs to OR gates. The commonly used PLA’s are logic networks of this form; therefore two-level networks are an important special case. We denote the N , primary inputs of the two-level network as in,, in2,

, in,, the M AND gates as gl , g2, . . . , gw, and the N O R gates as h , , h2 , - - , hN. In order to determine the two-vector sequence that maximizes dissipation in the network, we consider that the first vector, u0, is applied to the primary inputs of the circuits and the circuit is allowed to stabilize. Then, at time r , we change the inputs to a second vector, ut. In addition, we assume that the changes in the inj’s and their complements from u0 to ut happen simultaneously-this would correspond to the often-encountered case of combinational logic being embedded between sets of latches. (The latches would produce the inj’s and their complements.)

In Fig. 1, we have a single-output two-level network. The AND gates g,, g 2 , and g3 each have associated cubes that correspond to conjunctions of input signals in l , in2, and in3 or their complements. For instance, g, receives inputs from inl and G; therefore its cube c1 = inl . in2.

We now derive transformational algorithms, under several gate delay models, for computing Boolean equations for the transitions made by each gate output in the network.

I ) Zero-Delay Model: Under the zero-delay model, transitions are assumed to happen instantaneously. There- fore, glitches cannot occur at the outputs of any of the gates, and each gate can make at most one transition: 0 -+ 1 or 1 -+ 0. For an AND gate gi in a two-level network to make a 0 -+ 1 transition,

. . .

f ; = (ci n vo = 4) A (c, n ut + 4) (2)

must to be satisfied (evaluate to a logic 1). Here, the superscript r denotes that this is the Boolean equation for a rising transition, and ci denotes the cube corresponding to the AND gate g j . As mentioned earlier, ci will contain lit- erals corresponding to the variables inj or their complements. If, for instance, ci = inl - in2, the Boolean equation corresponding to the above condition would be

-

f r = (1 O VOl V O O d 2 ) A (1 O Ut1 A O O Dt2)

where 8 corresponds to the exclusive-or (XOR) operator, and 0 corresponds to the exclusive-nor (XNOR) operator. The above equation represents the fact that in order for ci to evaluate to a 1, inl = 1 and in2 = 0. The equation simplifies to

f ; = (s V d2) A (ut, A z). Similarly, in order for an AND gate g j to make a 1 -+ 0

(3)

must be satisfied, where here the superscript f stands for falling transition. For an OR gate hi to make a 0 + 1 transition, we require that the previous values of inputs all be

transition,

f { = (c; n vo z 4) A (ci n ut = 4)

376 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. 11. NO. 3, MARCH 1992

Fig. 1 . Single-output two-level network.

0 and that at least one input to the OR gate make a 0 -+ 1

feed the OR gate hi, the Boolean equation for the rising transition is

transition. Assuming that the AND gates g,, g2, - 9 gM

er = ((ci n VO = 4) A (c2 n VO = 4) . - A (c,,,, n vo = 4)) A (f; v f; . v fh). (4)

Similarly, for an OR gate hi to make a 1 -+ 0 transition, we require that at least one previous value of input be a 1 and that all inputs that were 1 make a 1 -+ 0 transition. The condition is encapsulated by

e{ = ((cl n vo + 4) v (c2 n vo + 4) - . v (c,,,, n vo + 4) ) A ( (c , n ut = 4) A . - A (c,,, n ut = 4)) . (5)

Given a network with M AND gates and N OR gates, the (2M + 2N)-output Boolean function in which we are in- terested has M equations of the form of (2), M equations of the form of (3), N equations of the form of (4), and N equations of the form of (5 ) . If each of these equations has an associated weight related to the size of the load capacitance on the gate output involved, as described in Section 11, then an assignment for v0 and ut that maximizes the sum of the weights associated with the satisfied equations will also maximize dissipation.

2) Unit-Delay Model: The unit-delay model results in exactly the same analysis as the zero-delay model. Each AND gate can make a single transition at most. Since we assume unit delays for all the AND gates, this means that the inputs to the OR gates change simultaneously. This implies that the OR gate makes a single transition, i.e., does not glitch. Equations (2)-(5) can be used directly to estimate maximum dissipation in a two-level circuit under the unit-delay model.

3) A General Delay Model: In the general case, there will be differences in the switching times of the different gates in a network. Large fan-in gates will tend to switch more slowly. Switching times are also affected by fan-out loads.

Under a general delay model, delay attributes of different gates can be arbitrary positive integer values. This

Fig. 2. Glitch at OR gate output.

complicates dissipation estimation since glitches can occur. Assuming, as before, that the inputs change simultaneously, the AND gates can make at most a single transition. However, an OR gate may see two transitions occurring at different inputs at different times. If the transitions happen to be of the form of Fig. 2, the OR gate output will glitch.

Lemma 1 : An OR gate in a two-level logic circuit can make no transition, a 0 -+ 1 transition, a 1 -+ 0 transition, or glitch 1 -+ 0 -+ 1. No other output changes can occur under an arbitrary delay model from the application of a two-vector input sequence.

Pro08 Any AND gate that feeds the OR gate in ques- tion can make at most a single transition, 0 -+ 1 or 1 -+

0. If the former occurs, we have a controlling value at the input to the OR gate, and whatever other input transitions follow, the output of the OR gate will not change from logic 1. Thus, if the OR gate makes a 0 -+ 1 transition, its output stays constant afterward. This implies that a 0 -+

1 -+ 0 glitch cannot occur, and the only transitions possible are those enumerated above.

If we wish to find the correct input pattern that maximizes the dissipation under a general delay model, we need to augment the set of equations derived earlier to include OR gate glitching. The AND gate equations, namely, (2) and (3), remain the same. For each OR gate, the strategy we will use is to construct a two-output function that corresponds to the number of transitions the gate will make. A value of 00 for the two-output function corresponds to no transitions, 01 corresponds to a 0 -+ 1 transition, 10 corresponds to 1 --* 0 transition, and finally 11 corresponds to a 1 -+ 0 -+ 1 glitch (two transitions).

The conditions necessary for a 0 -+ 1 transition are simply those of the form of (4). The conditions for a 1 -+ 0 transition and no change thereafter are given by (5 ) . To produce a 1 + 0 -+ 1 glitch on an OR gate, the AND gates feeding the OR gate must satisfy the following conditions:

No AND gate stays at 1. At least one AND gate feeding the OR gate should make a 1 -+ 0 transition. At least one AND gate should make a 0 --+ 1 transition. All the AND gates making 1 -+ 0 transitions should do so before the AND gates making 0 -+ 1 transitions.

We can write Boolean equations for each of the conditions above:

DEVADAS ef al. : ESTIMATION OF POWER DISSIPATION 311

((cI n vo = 4) v (c, n ut = 4)) A . A ((CM n vo = 4) v (cM n ut = 4))

f ; v f ; v * * . v f h f { v f J v * . - V f L

(fLfJ(4 5 d2) A * * A f ; f h 4 5 dd)

A (fL-IfhdM-I 5 d.44)) (6)

A (f;f{(dl 5 d2) A * - A f;fhdl I dM)) A * -

where f [ and f{are as defined in (2) and (3), and dj is the delay of AND gate gj .

One can now use the max-satisfiability algorithms of Section V on the constructed function to find the two-vector input sequence that maximizes dissipation under an arbitrary delay model. It should be noted that, given a circuit, the dj values are constants and the inequalities in (6) evaluate to 1 or 0.

B. Multilevel Static CMOS Circuits In this subsection, we will generalize the transforma-

tions presented in the previous subsection to apply to arbitrary multilevel implementations under the zero and unit delay assumptions. As before, the multilevel circuit is assumed to have NI inputs in,, in2, , inN,, and we will assume that the first vector, u0, is applied to the inputs and that the circuit is allowed to stabilize. Then at time t , we apply the second vector, ut. Again, we will assume that the changes in the inj from u0 to ut happen simultaneously.

I ) Zero-Delay Model: Under the zero-delay model, each gate in a multilevel circuit can make at most a single transition (since all transitions happen instantaneously). Therefore, we can use the logic function of each gate to determine whether or not the gate will switch on the application of ut. This is simply represented by the condition

.6 ((hi n L a = 4)

A (hi n ut z 4)) v ((hi n vo z 4)

A (hi n ut = 4))

3

(7)

where hi is the logic function corresponding to gate gi in the multilevel network. It remains to find an input sequence u0, ut that maximizes Cjlh = , Cj, where Cj is the output capacitance of ga te j (and the sum is taken over the indicesj for which& is satisfied).

2) Unit-Delay Model: Even under the idealization of a unit-delay model, the gate output nodes of a multilevel network can have multiple transitions in response to a two- vector input sequence. In fact, it is possible for a gate output to have as many transitions as levels in the network. For problems of this kind, it is easiest to construct the Boolean equations that describe the dependence of gate output transitions on the input sequence u0, ut by a post- processing of symbolic simulation.

Qk Fig. 3 . Two-level INVERTER-AND network.

Specifically, we construct the Boolean functions describing the gate outputs at the discrete points in time im- plied by the unit-delay model. That is, we consider only discrete times t , t + 1, . - , t + 1 , where t is the time when the input changes from u0 to ut, and I is the number of levels in the network. For each gate output i , we use symbolic simulation to construct the I Boolean functions J ( t + j ) , j E 0, - , I, which evaluate to 1 if the gate’s output is 1 at time t + j . Note that as we assume no gate has zero delay and that the network has settled before the inputs are changed from u0 to ut, J ( t ) is the logic function performed on u0 at the ith gate output. Finally, we can determine whether a transition occurs at a boundary between discrete time intervals t + i and t + i + l by xoR’ingA(t + i) withA(t + i + 1).

For example, consider the network in Fig. 3. For this network,

f i ( t> = vol h ( t ) = & A ~ 0 2 .

Assuming both gates have unit delay,

fi(t + 1) = inl(t> = vt, h ( t + 1) = f i ( t ) A in2(t) = ZiT, A ut2.

Finally,

h(t + 2) = fi( t + I ) A in2(t + 1) = A ut2.

For this example there are three possible transitions: that the INVERTER changes state from t to t + 1, that the AND gate changes state from t to t + 1, and that the AND gate changes state from t + 1 to t + 2. The Boolean equations for these transitions are, respectively,

el = f i ( t ) @ h ( t + 1)

e2 = h(0 8 f 2 @ + 1)

e3 = f2(t + 1) 8 h(t + 2 ) .

Note that for this example, the two-vector sequence La, = 0, u02 = 0 , ut, = 1, ut2 = 1 simultaneously satisfies e l , e2, and e3 and will therefore maximize dissipation, re- gardless of the weighting produced by load capacitance.

It is not usually necessary to generate a n i @ + j ) for all values o f j between 0 and I; many of these terms can be discarded by considering two easily computed quantities.

Dejinition I : The minrank of a logic gate output is one plus the minimum of the minranks of the logic gate’s inputs. Primary inputs have a minrank of 0.

~

378

Definition 2: The maxrank of a logic gate output is one plus the maximum of the maxranks of the logic gate’s inputs. Primary inputs have a maxrank of 0.

The minrank and maxrank quantities are useful because of the following lemma.

Lemma 2: The Boolean equations for all possible transitions of the ith logic gate can be determined from xoR’ing neighbors in the ordered set of Boolean functions

{ J ( t ) , J ( t + minrank;),J(t + minrank; + l) , ,

J ( t + maxrank; - l), J ( t + maxrank;)}. (8) The above lemma follows directly from the unit delay

model. A gate output cannot change until the change in the nearest input propagates through, and will stop changing when the input furthest away finally arrives.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. I I . NO. 3. MARCH 1992

V. SOLVING MAX-SATISFIABILITY A. Introduction

The classifical definition of max-satisfiability involves finding a satisfying assignment for clauses in a Boolean function, where the clauses represent disjunctions of lit- erals [5]. Algorithms have been proposed (e.g., [SI) for exact and approximate solutions to this problem. Here, we are concerned with finding satisfying assignments for arbitrary Boolean functions that can themselves involve conjunctive and disjunctive terms. In the sequel, we present two strategies we have developed for solving general max-satisfiability .

B. Max-Satisfiability via Disjoint Cover Enumeration In this subsection, we will show how a disjoint cover

(or truth table) for a multiple-output Boolean function can be searched to find a maximum satisfying assignment over a set of weighted outputs. This is a useful observation since efficient disjoint cover enumeration algorithms (e.g., [9]) have been developed in the past.

Two cubes are said to be disjoint if they have no input combinations in common, i.e., do not intersect. The cubes i, - i2, and < are disjoint. However, the cubes i , * i2 and i2 i3. A disjoint cover for a multiple-output function is simply a truth table for the function, where each cube in the cover is disjoint from every other cube. An example of a disjoint cover for a two-output function is shown in Fig. 4. For instance, the first cube, loo-, corresponds to i l . i2 a i3 .

Theorem I : Given a multiple-output Boolean function with associated positive weights for individual outputs, a maximum satisfying assignment can be found by inspecting all the cubes in a disjoint cover of the multiple-output function.

Proof: A disjoint cover has the property that finding the outputs asserted by any cube c in the cover is merely a process of inspecting c’s output part. No other cubes need be considered, in contrast to a sum-of-products

- -

- - i3 share the cube il . iz

- -

il i 2 B i 4

\\ // 100- 11

00-1 01

-111 10

11G 10

MM 01

11 12

(b)

Fig. 4. Disjoint cover for a multiple-output network

cover, where the outputs of all cubes that contain c must be found. A disjoint cover thus explicitly identifies all the maximal output combinations that the multiple-output function can assert. Since the weights are all positive, walking through the maximal output combinations suf- fices to find the weighted max-satisfying assignment.

To find a maximum satisfying assignment under arbitrary weights for outputs, we simply walk down the list of cubes, computing the value of the weight function for each cube, and pick the cube with the highest value.

C. A Branch-and-Bound Algorithm One approach to solving max-satisfiability is to use a

branch-and-bound strategy. To be efficient, a branch-and- bound algorithm requires an easily computed but reasonably accurate bound for pruning the search, and we use a method based on computing disjoint sets. That is, given a partitioning of a set of outputs into a collection of M disjoint groups,2 M is clearly a bound on the maximum number of outputs that can be satisfied. Partitioning a set of outputs into the minimum number of disjoint groups would solve the max-satisfiability problem and is itself NP-complete. We compute a reasonably tight bound by partitioning the outputs into a few large disjoint groups via a fast greedy algorithm.

Implicitly assumed in our branch-and-bound algorithm below is that ON sets of the Boolean functions are represented either in sum-of-products form or as reduced, ordered binary decision diagrams [2]. It should be noted that the conversion of an arbitrary, multilevel network into either of these forms is an NP-complete operation. How- ever, it is possible to convert many real circuits to these forms. In order to compactly describe the branch-and- bound algorithms, we define U as the set of all N outputs, L as a current set of selected outputs, and R = U - L as the set of currently remaining outputs. Initially, R = U and L = 4.

’A disjoint group of Boolean function outputs satisfies the property that any pairwise intersection of output ON sets is null, where an ON set is the subset of all possible input vectors to the logic network that produces a logic 1 at the output of the Boolean function of interest.

DEVADAS er al. : ESTIMATION OF POWER DISSIPATION 319

1) Remove all outputs from R whose ON sets have a null intersection with the intersected ON sets of the current select set L.

2) Find a partitioning of R into a small number, de- noted M , of disjoint groups of outputs. If 11 L 11 + M is less than or equal to the best solution found thus far, return from this level of recursion. Otherwise, if R = 4, declare this solution as the best recorded thus far.

3) Heuristically select an output f from R. (Its ON set will have a nonnull intersection with the intersected ON sets of the selected output set, L). Recur with L = L U f and R = R - f. Recur also for L un- changed, R = R - f.

D. An Illustrative Example In this subsection, we give an example that illustrates

the various steps in determining the input vector that causes maximum activity in a CMOS circuit. We will consider a function of the form

(XI + x2 + Y l ) . (F + x2 + Y l ) . (x3 + x4 + Y2)

* (G + x4 + Y2) * * ( X Z K - I + XZK + Y K )

* ( X G Y + X 2 K + Y d . (9) An example function with K = 3 is shown in Fig. 5. We will consider a dynamic CMOS implementation of the circuit and assume that the inputs are available in true and complemented form. The circuit is also assumed to be im- plemented with N-type gates.

As described in Section 111, the construction of the multiple-output Boolean function representing gate activity simply corresponds to complementing each of the logic functions corresponding to the gate outputs. Since we have seven gates in the circuit of Fig. 5 , we have a seven-output Boolean function, as shown in Fig. 6.

We now need to find a max-satisfying assignment for the function of Fig. 6. We have two alternatives: we can use the disjoint cover enumeration approach or the more sophisticated branch-and-bound algorithm. Using the latter we find that the input vector xI = 0, x2 = 1, yl = 0, x3 = 0, x4 = 1, y 2 = 0, x5 = 0, x6 = 1, and y 3 = 0 causes four gates to make 1 + 0 transitions (the gates are all assumed to be precharged high), as shown in Fig. 7. This vector is found quite simply by intersecting the covers corresponding to outputs fl, f3, f5, and f7 of Fig. 6. (The cover corresponding to output fl is simply the cube corresponding to * x2 * E). The reader can verify that no vector can cause more than four gates to make transitions in the network of Fig. 5 .

In Fig. 8(a), a function belonging to the same family as the function of Fig. 5 , but with K = 2, is shown. The multiple-output Boolean function corresponding to the given circuit is shown in Fig. 8(b). A disjoint cover for the Boolean function is shown in Fig. 8(c). It can be seen that vectors 010010, 100100,010100, and 100010 are the only vectors that produce maximum activity corresponding to three gates switching in the circuit of Fig. 8(a).

r5' 23 r6 Y3

Fig. 5 . An example circuit for power estimation.

rl A?' Y l '

r3t x4

Y2'

r3 r 4 I

Y2'

f2

f3

n

f4

x5' - x6 - f5

Y3' -

y3' U Fig. 6. Boolean function for which max-satisfying assignment is to be

found.

f l 010------

The above examples illustrate several points. In many common networks, maximum activity may be caused only by a small number of vectors. For the circuit family of (9), for any K , maximum activity will be caused by exactly 2K vectors, which results in the switching of the K AND gates and the single OR gate. Given that the input

380 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. I I . NO. 3. MARCH 1992

rl f x2 Yl

r3 r4' Y2

r3' r4 Y2

X Z - 11

f2

15

x4 - f3

A?' -

:: =$-) f4 Y2'

0 1 0 0 1 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 - 1 0 0 0 1 0 1 0 1 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 - 1 0 0 0 1 1 0 0 1 1 - 0 1 0 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 - 0 1 0 0 1 1 1 - 0 1 0 0 0 1 0 1 i o i o i o 0 1 1 0 1 0 0 0 - 0 1 0 1 1 - 1 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 0 - 1 0 0

o o i o i 0 0 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1

f1 f2 f3 f4 f5 (C)

Fig. 8 . Using disjoint cover enumeration for max-satisfiability.

space is 23K, the likelihood of random simulation of input vectors finding the worst-case activity becomes smaller and smaller as K increases.

As can be seen in Fig. 8(c), we have a large number of cubes in a disjoint cover of the function, even for K = 2. As K increases, the number of cubes increases exponentially. On the other hand, the branch-and-bound algorithm above can find the worst-case vector in a number of intersections linear in K because the disjoint group bound terminates the search on the first step. This implies that for the circuit family of (9), the branch-and-bound strategy will become significantly more efficient than the disjoint cover enumeration as K increases. Note, however, the circuit family of (9) represents a bad case for the disjoint cover method. Frequently, disjoint covers of multiple-output Boolean functions are of reasonable size, even for large functions.

E. Approximation Strategies Needless to say, finding a disjoint cover for an arbi-

trary, multiple-output function is NP-complete. For very large circuits, the algorithms described in the two previous subsections may require too much memory or CPU time to complete. In this case, it is of interest to find a tight upper bound on the maximum power dissipation. Since we are dealing with worst-case power, the approx-

imation or bounding algorithm should not underestimate the power dissipation.

The number of cubes in a disjoint cover of a multiple- output Boolean function strongly depends on the number of outputs in the function. One way of reducing the number of cubes, thereby easing the problem of max-satisfiability, is to merge two (or more) outputs into a single output that is the Boolean OR of the two outputs, with an associated weight that is the sum of the individual weights. This implies that we have expanded the ON sets of the two outputs. Expansion of the ON sets provides a correct bound, since the weight function corresponding to any assignment only increases (or stays constant). Arbitrarily large expansions will result in undesirably loose bounds; one wishes, therefore, to select outputs with large owset intersections and merge these outputs together, rather than outputs with small or null owset intersections.

We can compute the worst-case error arising from an expansion of outputs, assuming constant weights for sim- plicity, as follows: Let the original ON sets be ON, , 1 5 i 5 N . Let the expanded ON sets be ON,'. Compute E, = ON,' - ON,. Find a set of maximal disjoint groups of outputs using E,; namely, find D l , D2 , * - - , Dw such that D1 U D , U * * * D ,+, - - Q, where Q is the set of outputs, such that E, # 4 and no output belongs to more than one Dk. In the worst case we may erroneously assume satis-

DEVADAS er al . : ESTIMATION OF POWER DISSIPATION 38 I

fiability for an output from each of the D,. Thus, the worst- case error is given by M .

Merging outputs can also reduce the complexity of the branch-and-bound algorithm. In the worst case, one searches for all possible 2* combinations of intersecting outputs, given N outputs. Reducing N can speed up the algorithm considerably. This approximation allows power estimation for arbitrarily large circuits, but an intelligent merging of outputs has to be performed in order not to significantly overestimate worst-case power dissipation.

Another means of approximation is simply to assume that certain gates make the maximum number of possible transitions (bounded by the number of levels of logic that precede the gate), and not generate the transition equations for these gates. This is especially useful when a large number of gates occur in the first two or three levels of the logic circuit and many fewer gates occur in later levels. We ignore the transition equations corresponding to the “deeper” gates and add their maximum number of possible transitions to the worst-case estimate obtained on the circuit after deleting these gates.

VI. EXPERIMENTAL RESULTS In this section, we present preliminary experimental re-

sults using the power estimation techniques described earlier. We chose several PLA and multilevel circuit examples, whose statistics are summarized in Table I. The PLA examples are tcktl through tckt4 and the multilevel circuits are mcktl through mckt6. The numbers of inputs, outputs, gates, and logic levels in each of the circuits are given in the table. For these preliminary experiments, we assumed that all the gates had the same capacitive load and that the weighting was therefore a constant.

We used a general delay model for the two-level circuits, since the difference in the sizes of the gates was large. A unit-delay model was used for the multilevel circuits.

In Table 11, we present the results obtained using the transformation and max-satisfiability algorithms. The numbers of inputs and outputs in the transformed logic function are indicated. The total CPU time required on a pVAX-I11 for transformation and for max-satisfiability checking by disjoint cover enumeration using the program PLOVER [9] and the branch-and-bound algorithm are given in this table. The maximum number of outputs that could be simultaneously satisfied is also indicated (#transitions). The examples tckt* are benchmark PLA’s. Ex- amples mcktl and mckt2 are multilevel circuits obtained by factoring PLA’s using the program MIS [ 11. Example mckt3 is an 8 X 8 multiplier. Examples mckt4, mckt5, and mckt6 are 4-bit, 8-bit, and 32-bit adders. The adders all used a 4-bit carry lookahead scheme. While for these examples the branch-and-bound strategy was not significantly faster than disjoint cover enumeration, as shown in subsection V-D there do exist functions for which the more sophisticated branch-and-bound strategy works much better. For the largest example, mckt6, exact max-

TABLE I STATISTICS OF EXAMPLES

Example No. Inputs No. Outputs No. Gates No. Levels

tckt 1 tckt2 tckt3 tckt4 mcktl mckt2 mckt3 mckt4 mckt5 mckt6

5 7 9

16 5 7

16 8

32 64

3 3 1

65 3 6

16 5

17 33

31 147 140 23 1 45 53

504 89

369 133

2 2 2 2 3 4

32 6

24 48

satisfiability checking via either disjoint cover enumeration or branch-and-bound was not possible.

For all the examples (even those for which exact max- satisfiability was possible), we used the approximation strategy (which reduces the complexity of the satisfiability check) described in Section V. The difference between approximation and exact strategies is small; however, the approximation algorithm takes significantly less CPU time and is viable for larger functions. The worst-case error (overestimation error) using the approximation strategy is also given for each of the examples-for examples tcktl through tckt4 and examples mcktl through mckt5, the error in approximation is significantly smaller than the worst-case error.

Worst-case power in examples such as mckt6 can also be approximated (other than using the techniques of Sec- tion V) simply by breaking up the circuit into smaller fragments and estimating worst-case power for each of the fragments. This technique would be especially useful in the case of bit-sliced circuits, for example, an ALU. As can be seen, the number of transitions in mckt5, an 8-bit adder, is very close to twice the number of transitions in mckt4, a 4-bit adder.

In Table 111, we compare our estimation technique with exhaustive simulation using SPICE, a trivial prediction method, and random simulation of vector pairs using SPICE. Exhaustive simulation using SPICE is assumed to produce the most accurate result. For static CMOS circuits, this will entail 2*” SPICE runs, which is infeasible for all but the smallest circuits. However, this gives us an idea as to the accuracy of our estimation method. Note that the power calculations were performed assuming a 25 MHz clock.

The various methods in Table I11 are detailed below. The individual transistors in examples tckt* and mckt* were modeled with the LEVEL 2 MOSFET model [7] in SPICE. For the SPICE simulation, we have the following:

The column MAXSAT corresponds to using the input vector pair obtained by exact (when feasible) or approximate max-satisfiability checking, assuming a single load capacitance for each gate, and then simulating the given network with this input vector under the LEVEL 2 model in SPICE. The time taken

382 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. I I . NO. 3, MARCH 1992

TABLE I1 WORST-CASE POWER DISSIPATION RESULTS

Example

Exact

CPU Time No.

B&B Cover Transitions

Approximate ~

CPU Time

B&B Cover No.

Transitions Worst Case

Error

tckt 1 tckt2 tckt3 tckt4 mckt 1 mckt2 mckt3 mckt4 mckt5 mckt6

6 min 36 min 45 min

1.3 h 21 min 30 min

2.6 h 14 min 2.1 h

> 10 h

5 min 0.9 h 1.1 h 2 h

23 min 37min

3.5 h 11 min 2.7 h

> 10 h

1 1 53 59 69 22 22

389 106 44 1

4 min 2 min 19min 21 min 20 min 19 min 30 min 32 min

6 min 5 min 16min 16min

1.3 h 1.6 h 7 min 6 min 0.9 h 1.1 h 1.7 h 2.1 h

11 57 64 72 23 24

406 117 479 89 1

5 15 12 15 5

10 41 18 56

102

TABLE 111 COMPARING MAX-SATISFIABILITY TO OTHER APPROACHES

Exhaustive Trivial Random Maxsat

Power CPU Power Power CPU Power CPU Example (mW) Time (mW) (mW) Time (mW) Time

tcktl 1.1 tckt2 tckt3 - tckt4 - mcktl 2.2 mckt2 - mckt3 - mckt4 - mckt5 - mckt6 -

- 8.5 h -

2.9 1.1 1.3 h 1.1 13.2 4.4 7.1 h 5.7 12.6 4.9 7.1 h 6.2 20.8 5.1 10 h 7 .0 4.0 1.9 2.6 h 2.2 4.7 2.2 2 .5 h 2.3

45.3 22.4 31 h 42.5 8.0 8.5 5.1 h 10.9

33.2 22.6 11 h 45.2 66.0 44.3 42 h 91.1

8 min 42 min 43 min

1.0 h 16 min 15 min 3.1 h

31 min 1.1 h 4.2 h

for max-satisfiability checking was given in Table 11, and the CPU time taken for a single SPICE simulation is given in Table 111. The column TRIVIAL corresponds to a trivial prediction method, where each gate is assumed to switch once and the sum total of the load capacitances was used to estimate energy dissipation. The CPU time required by this method is not given, since it is neg- ligible. The column RANDOM corresponds to simulating ten randomly generated input-vector pairs on the network using SPICE under the LEVEL 2 model, and picking the vector pair that causes worst-case dissipation. The CPU time required in this case is given, corresponding to ten runs of SPICE. Finally, the column EXHAUSTIVE corresponds to an exhaustive SPICE simulation over all possible input vector pairs. This is possibly only for the two smallest circuits.

As can be seen, for the two circuits that can be exhaus- tively simulated, the numbers obtained under the MAXSAT column are very close to the numbers under the EXHAUSTIVE column. However, using max-satis-

fiability reduces the CPU time by approximately a factor of 50. Random simulation (RANDOM) underestimates the worst-case dissipation, sometimes significantly, and is ten times slower than using max-satisfiability (MAXSAT). The trivial prediction method (TRIVIAL) grossly overestimates the power dissipation for the two-level circuits, where only a subset of the gates switch, and underestimates the power dissipation for the multilevel circuits, where glitching becomes significant.

It is not necessary to simulate the network under the worst-case vector pair using SPICE. Faster simulators that use simplified switch-level models of transistors, which correlate well in terms of power dissipation with SPICE models, can be used instead. SPICE represents one extreme, and estimating energy using a simple ;CV*NG for- mula (where N G is found using max-satisfiability) the other extreme. The value in using the techniques presented in this paper is that the search for the worst-case input vector pair(s) is done efficiently under a simplified model of power dissipation, and then a small set of vectors can be accurately simulated.

Our method deals with logic gates rather than individual transistors, and we have successfully applied max- satisfiability checking to static CMOS networks with over 500 gates and 2000 transistors (e.g. mckt3).

VII. CONCLUSIONS, LIMITATIONS, AND FUTURE WORK The difficulty in estimating worst-case power dissipa-

tion in combinational circuits stems mainly from the fact that power dissipation is input pattern dependent and the number of possible input patterns grows exponentially with the number of circuit inputs. By focusing on a simplified model of power dissipation, namely weighted gate output activity, we were able to transform the problem to one of weighted max-satisfiability . We developed exact and approximate algorithms to solve weighted max-satisfiability. The approximate algorithms are robust in the same sense that they do not underestimate the worst-case power dissipation in the circuit.

DEVADAS er al. : ESTIMATION OF POWER DISSIPATION 383

Future work will address relaxing the assumptions made in developing a simplified model of power dissipation. In particular, we believe that techniques used in switch-level simulation can be used to derive a delay model that can more accurately predict the occurrence and magnitude of glitches. The techniques described here can also be ex- tended to finding a three-vector repeutuble sequence that causes maximum power dissipation in static CMOS circuits. We are exploring similar techniques to finding the peak supply current densities. This is also an important problem because peak current densities are linked to signal integrity.

ACKNOWLEDGMENT The authors would like to thank Dr. Y. T. Yen of the

Digital Equipment Corporation for bringing this problem to our attention and M. Coiley for many valuable discus- sions on the subject.

REFERENCES

[ l ] R. Brayton, R. Rudell, A. Sangiovanni-Vincentelli, and A. Wang, “MIS: A multiple-level logic optimization system,’’ IEEE Trans. Computer-Aided Design, vol. CAD-6, pp. 1062-1081, Nov. 1987.

[2] R. Bryant, “Graph-based algorithms for Boolean function manipulation,” IEEE Trans. Comput., vol. C-35, pp. 677-691, Aug. 1986.

[3] R. Burch, F. Najm, P. Yang, and D. Hocevar, “Pattern-independent current estimation for reliability analysis of CMOS circuits,” in Proc. 25th Design Automat. Conf., June 1988, pp. 294-299.

[4] S. Chowdhury and J. S . Barkatullah, “Estimation of maximum cur-

rents in MOS IC logic circuits,” IEEE Trans. Computer-Aided De- sign, vol. 9 , pp. 642-654, June 1990.

[5] M. R. Garey and D. S . Johnson, Computers and Intractabilify: A Guide to the TheoryofNP-Completeness. New York: W. H. Freeman, 1979.

[6] L. Glasser and D. Dobberpuhl, The Design and Analysis of VLSI Cir- cuirs. Reading, MA: Addison-Wesley, 1985.

[7] D. A. Hodges and H. G. Jackson, Analysis and Design of Digital In- tegrated Circuits, 2nd ed.

[8] D. S . Johnson, “The NP-completeness column: An ongoing guide,” J. Algorithms, vol. 6 , pp. 291-305, 1985.

[9] H-K. T . Ma, S. Devadas, R-S. Wei, and A. Sangiovanni-Vincentelli, “Logic verification algorithms and their parallel implementation,” IEEE Trans. Computer-AidedDesign, vol. 8 , pp. 181-189, Feb. 1989.

New York: McGraw-Hill, 1988.

Srinivas Devadas (S’87-M’88), for a photograph and a biography, please see page 300 of this issue.

Kurt Keutzer (S’83-M’84), for a photograph and a biography, please see page 300 of this issue.

Jacob White (A’88) received the B.S. degree in electrical engineering and computer science from the Massachusetts Institute of Technology, Cam- bridge, and the S.M. and Ph.D. degrees in electrical engineering and computer science from the University of California, Berkeley.

He was with the IBM T. J. Watson Research Center, Yorktown Heights, NY, from 1985 to 1987 and was the Analog Devices Career Devel- opment Assistant Professor at the Massachusetts Institute of Technoloev from 1987 to 1989. He --

was a Presidential Young Investigator for the year 1988. His current inter- ests are in theoretical and practical aspects of serial and parallel numerical methods for problems in circuit design and fabrication.

estimation of power dissipation in cmos combinational circuits using boolean function manipulation

Documents