toward nanoprocessor thermodynamics

8
902 IEEE TRANSACTIONS ONNANOTECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2013 Toward Nanoprocessor Thermodynamics Neal G. Anderson, Senior Member, IEEE, ˙ Ilke Ercan, Student Member, IEEE, and Natesh Ganesh Abstract—A hierarchical methodology for the determination of fundamental lower bounds on energy dissipation in nanoproces- sors is described. The methodology aims to bridge computational description of nanoprocessors at the instruction-set-architecture level to their physical description at the level of dynamical laws and entropic inequalities. The ultimate objective is hierarchical sets of energy dissipation bounds for nanoprocessors that have the character and predictive force of thermodynamic laws and can be used to understand and evaluate the ultimate performance lim- its and resource requirements of future nanocomputing systems. The methodology is applied to a simple processor to demonstrate instruction- and architecture-level dissipation analyses. Index Terms—Information entropy, microprocessors, nanoelec- tronics, power dissipation. I. INTRODUCTION H EAT dissipation threatens to limit performance gains achievable from post-CMOS nanocomputing technolo- gies, regardless of future success in nanofabrication. Simple analyses suggest that the component of dissipation resulting solely from logical irreversibility, inherent in most computing paradigms, may be sufficient to challenge heat removal capa- bilities at the circuit densities and computational throughputs that will be required to supersede ultimate CMOS. 1 Compre- hensive lower bounds on this fundamental component of heat dissipation, if obtainable for specified nanocomputer implemen- tations in concrete nanocomputing paradigms, will thus be use- ful for determination of the ultimate performance capabilities of nanocomputing systems under various assumptions regarding circuit density and heat removal capabilities. Most studies of the physical cost of logical irreversibility, including Landauer’s inaugural work [1], have focused on de- termination of dissipation bounds for idealized 1-bit memories and/or other simple physical systems implementing elemen- tary logical operations. We recently showed how fundamental dissipation bounds can be obtained for more complex logic circuits, designed within concrete nanocomputing paradigms, Manuscript received December 17, 2012; accepted April 17, 2013. Date of publication April 26, 2013; date of current version November 6, 2013. This work was supported by the National Science Foundation under Grant CCF-0916156. A preliminary version of this paper was presented in the Proceedings of the 12th IEEE Conference on Nanotechnology, Birmingham, U.K., August, 2012. The review of this paper was arranged by Associate Editor P. Famouri. The authors are with the Department of Electrical and Computer Engineer- ing, University of Massachusetts Amherst, Amherst, MA 01003-9292 USA (e-mail: [email protected]; [email protected]; nganesh@engin. umass.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNANO.2013.2260352 1 For 10 10 devices/cm 2 , each switching at 10 13 s 1 and dissipating at the Landauer limit E min k B T , we have P diss = 414 W/cm 2 at T = 300 K. taking full account of circuit structure and clocking. We specif- ically obtained bounds for dynamic logic circuits in two emerg- ing nanocomputing paradigms [2]–[4], using minimal device abstractions that capture computationally essential features of device operation but are free of material and device parameter estimates. The use of such abstractions ensures that the result- ing bounds reflect only the essential physical costs required for a circuit to achieve its desired computational ends through its paradigm-specific strategic means in a manner consistent with physical law. In this study, we describe extension of this approach to treat- ment of full processors executing sequences of operations. The objective is a methodology for obtaining a hierarchy of dis- sipation bounds associated with multiple levels of granular- ity in computational description of a given processor—from instruction-driven transformations of the processor state to the associated bit transitions in individual devices—revealing con- tributions to dissipation from each level. This methodology, its underlying theoretical foundation, and interpretation of the resulting bounds are sketched in Section II. Illustrative appli- cations, involving a simple 4-bit demonstration processor pro- grammed to operate on random streams of input data, are pre- sented in Section III. Bounds on energy dissipation per program loop associated with the higher levels of the analysis hierarchy are obtained for these example applications, and their depen- dence on input statistics considered. Scalability of our method- ology to more complex processors and programs is then dis- cussed in Section IV, as is full integration of bounds obtained from all levels of the hierarchy. Section V concludes this paper with a brief discussion of insights that could result from full development and application of the proposed methodology. II. METHODOLOGY This study, like our previous studies of dissipation in nanocomputing circuits, is based on a “referential” approach to the fundamental physical description of classical informa- tion processing [5]–[7]. This approach is rooted in an overtly relational conception of information, and it provides a natural framework for isolating and quantifying irreversible information loss in computational settings and the resulting dissipative costs. Specifically, an “information processing artifact” A (here a rel- evant part of a nanoprocessor) is situated in a globally closed universe decomposed into relevant domains and subsystems, including a heat bath B in direct contact with A (see Fig. 1). The information that A holds about the state of some external referent system R, such as an external input file, is associated with pairwise correlations between the statistical physical states of A and R and quantified by a mutual information measure. State transformations of A that irreversibly reduce these cor- relations irreversibly reduce the amount of information about 1536-125X © 2013 IEEE

Upload: natesh

Post on 05-Apr-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Toward Nanoprocessor Thermodynamics

902 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2013

Toward Nanoprocessor ThermodynamicsNeal G. Anderson, Senior Member, IEEE, Ilke Ercan, Student Member, IEEE, and Natesh Ganesh

Abstract—A hierarchical methodology for the determination offundamental lower bounds on energy dissipation in nanoproces-sors is described. The methodology aims to bridge computationaldescription of nanoprocessors at the instruction-set-architecturelevel to their physical description at the level of dynamical lawsand entropic inequalities. The ultimate objective is hierarchicalsets of energy dissipation bounds for nanoprocessors that have thecharacter and predictive force of thermodynamic laws and can beused to understand and evaluate the ultimate performance lim-its and resource requirements of future nanocomputing systems.The methodology is applied to a simple processor to demonstrateinstruction- and architecture-level dissipation analyses.

Index Terms—Information entropy, microprocessors, nanoelec-tronics, power dissipation.

I. INTRODUCTION

H EAT dissipation threatens to limit performance gainsachievable from post-CMOS nanocomputing technolo-

gies, regardless of future success in nanofabrication. Simpleanalyses suggest that the component of dissipation resultingsolely from logical irreversibility, inherent in most computingparadigms, may be sufficient to challenge heat removal capa-bilities at the circuit densities and computational throughputsthat will be required to supersede ultimate CMOS.1 Compre-hensive lower bounds on this fundamental component of heatdissipation, if obtainable for specified nanocomputer implemen-tations in concrete nanocomputing paradigms, will thus be use-ful for determination of the ultimate performance capabilitiesof nanocomputing systems under various assumptions regardingcircuit density and heat removal capabilities.

Most studies of the physical cost of logical irreversibility,including Landauer’s inaugural work [1], have focused on de-termination of dissipation bounds for idealized 1-bit memoriesand/or other simple physical systems implementing elemen-tary logical operations. We recently showed how fundamentaldissipation bounds can be obtained for more complex logiccircuits, designed within concrete nanocomputing paradigms,

Manuscript received December 17, 2012; accepted April 17, 2013. Date ofpublication April 26, 2013; date of current version November 6, 2013. This workwas supported by the National Science Foundation under Grant CCF-0916156.A preliminary version of this paper was presented in the Proceedings of the 12thIEEE Conference on Nanotechnology, Birmingham, U.K., August, 2012. Thereview of this paper was arranged by Associate Editor P. Famouri.

The authors are with the Department of Electrical and Computer Engineer-ing, University of Massachusetts Amherst, Amherst, MA 01003-9292 USA(e-mail: [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNANO.2013.22603521For 1010 devices/cm2 , each switching at 1013 s−1 and dissipating at the

Landauer limit Em in ≈ kB T , we have Pdiss = 414 W/cm2 at T = 300 K.

taking full account of circuit structure and clocking. We specif-ically obtained bounds for dynamic logic circuits in two emerg-ing nanocomputing paradigms [2]–[4], using minimal deviceabstractions that capture computationally essential features ofdevice operation but are free of material and device parameterestimates. The use of such abstractions ensures that the result-ing bounds reflect only the essential physical costs required fora circuit to achieve its desired computational ends through itsparadigm-specific strategic means in a manner consistent withphysical law.

In this study, we describe extension of this approach to treat-ment of full processors executing sequences of operations. Theobjective is a methodology for obtaining a hierarchy of dis-sipation bounds associated with multiple levels of granular-ity in computational description of a given processor—frominstruction-driven transformations of the processor state to theassociated bit transitions in individual devices—revealing con-tributions to dissipation from each level. This methodology,its underlying theoretical foundation, and interpretation of theresulting bounds are sketched in Section II. Illustrative appli-cations, involving a simple 4-bit demonstration processor pro-grammed to operate on random streams of input data, are pre-sented in Section III. Bounds on energy dissipation per programloop associated with the higher levels of the analysis hierarchyare obtained for these example applications, and their depen-dence on input statistics considered. Scalability of our method-ology to more complex processors and programs is then dis-cussed in Section IV, as is full integration of bounds obtainedfrom all levels of the hierarchy. Section V concludes this paperwith a brief discussion of insights that could result from fulldevelopment and application of the proposed methodology.

II. METHODOLOGY

This study, like our previous studies of dissipation innanocomputing circuits, is based on a “referential” approachto the fundamental physical description of classical informa-tion processing [5]–[7]. This approach is rooted in an overtlyrelational conception of information, and it provides a naturalframework for isolating and quantifying irreversible informationloss in computational settings and the resulting dissipative costs.Specifically, an “information processing artifact” A (here a rel-evant part of a nanoprocessor) is situated in a globally closeduniverse decomposed into relevant domains and subsystems,including a heat bath B in direct contact with A (see Fig. 1).The information that A holds about the state of some externalreferent system R, such as an external input file, is associatedwith pairwise correlations between the statistical physical statesof A and R and quantified by a mutual information measure.State transformations of A that irreversibly reduce these cor-relations irreversibly reduce the amount of information about

1536-125X © 2013 IEEE

Page 2: Toward Nanoprocessor Thermodynamics

ANDERSON et al.: TOWARD NANOPROCESSOR THERMODYNAMICS 903

Fig. 1. Physical decomposition of an information-processing artifact A (e.g.,a nanoprocessor) and its surroundings into relevant subsystems that comprise aglobally closed composite.

Fig. 2. Granularity hierarchy for nanoprocessor dissipation analysis.

R in A and are necessarily dissipative processes involving B.Fundamental lower bounds on the resulting entropic and ener-getic costs can be obtained from quantum dynamics and entropicinequalities alone. Proper description of informationally lossycomputational operations within the referential approach thusenables determination of fundamental lower bounds on the dis-sipative costs of these operations in very general contexts, withclear isolation of irreversible information loss enabled by thereferent.

In a hierarchical application of the referential approach tonanoprocessors, a nanoprocessor executing a sequence of in-structions would be described and analyzed at three levelsof granularity—instruction-level (coarse grained), architecturelevel (medium grained), and circuit-level (fine grained)—withdissipation bounds obtained from the referential approach ateach level. The three levels of this hierarchy are depicted inFig. 2. We focus primarily on the highest two levels of thehierarchy in this study, as the lowest (circuit) level is mostclosely related to our previous work (see Section IV), and weassume conventional processor architectures for the sake ofconcreteness.2

In an instruction-level analysis (ILA), the artifact A consistsof only those registers and other internal state elements (e.g.,internal data memory) that are required by a processor’s in-struction set architecture. Dissipation bounds obtained at thislevel reveal costs associated with irreversible information lossinherent sequences of processor state transitions driven by in-struction sequences and (random) input data. Bounds obtainedfrom an ILA thus reflect the minimum costs of implementing

2While new and unconventional processor architectures will certainly emergeas nanocomputing research evolves, the first full post-CMOS nanoprocessordesigns have utilized somewhat conventional architectures. See, e.g., [8] and [9].

the required register state transitions by any means consistentwith physical law.

In an architecture-level analysis (ALA), A is refined, so statetransitions reflect the specific structure of the nanoprocessordatapath configuration realized during execution of each in-struction.3 Dissipation bounds obtained at the ALA level thuscapture physical costs of implementing the required processorstate transitions in the specific manner dictated by the data-path configuration, including particulars of how register statetransitions are conditioned upon the contents of other state ele-ments. This leaves only the fine-grained description of computa-tional processes within the functional blocks to be “filled in” (asin [2]–[4]) and integrated at the circuit-level tier for completionof the hierarchy.

Lower bounds on the energy dissipated to the processor’s en-vironment B in a given step can be determined from appropri-ate specialization of a very general dissipation bound, obtainedwithin the referential approach, applicable to conditional phys-ical state transformations of A that are “selected” by the stateof a component R1 of R (cf., Fig. 1). Specifically, if 1) trans-formation of the state of A is conditioned on the state πR1

i ofR1 (where πR1

i belongs to a mutually orthogonal set {πR1i }),

2) R1 is in state πR1i with probability (frequency) p

(1)i and is

preserved during state transformation of A, and 3) B is initiallyin a thermal state characterized by temperature T , then the ex-pected energy dissipated to B in the state transformation of Ais lower bounded as

Δ〈EB〉 ≥ kB T ln(2)

{∑i

p(1)i

[S(ρAi ) − S(ρA

i )]}

. (1)

Here, S(ρAi ) and S(ρA′

i ) are the self-entropies of the statisticalartifact states before and after the state transition, respectively,and kB is Boltzmann’s constant. Proof of this bound, given inthe Appendix, is based on unitary (Schrodinger) evolution ofRAB, entropic identities and inequalities, and a key inequal-ity from quantum thermodynamics. Treatment of sequences ofsuch processes further assumes that each of the conditional statetransformations described previously is followed by uncondi-tional restoration of B to a thermal state through interactionwith the surrounding environment B.4

A few notes are in order about the inequality (1): First, thisbound generally applies to cases where there are preexistingcorrelations between the states of R1 and A. (Hence, the aver-aging over states of R1 of the initial entropy of A.) Second, thechange in the artifact entropy can be expressed in terms of loss ofinformation about R from A and other changes in the entropy ofA that are unrelated to information loss, although we will workexclusively with the simple form (1) here. Finally, for the sakeof clarity, we will take all processor state elements representingspecified binary strings to be in pure, distinguishable physicalstates, implying that all entropy changes can be taken to result

3Costs of irreversible information loss from additional storage elements (e.g.,pipeline registers) that are utilized in the processor, but not explicitly requiredby the instruction set, would also be accounted for in an ALA analysis.

4For further discussion of such rethermalization processes, see [7].

Page 3: Toward Nanoprocessor Thermodynamics

904 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2013

Fig. 3. Instruction set for the demonstration processor of this work.

Fig. 4. Datapath organization for the 4-bit single-cycle demonstration proces-sor used in this study. Not shown are the program counter, instruction memory,and control logic.

from information loss and allowing all physical entropies of Ato be expressed as classical Shannon entropies [10].

III. ILLUSTRATIVE EXAMPLES

We now illustrate instruction- and architecture-level proces-sor dissipation analyses by applying our methodology to a sim-ple 4-bit single-cycle demonstration processor programmed tooperate on an input string. We first describe the instruction setarchitecture and datapath for the demonstration processor (seeSection III-A). We then obtain, evaluate, and discuss ILA andALA dissipation bounds for this processor running simple pro-grams that implement two functions: a transition detector forbinary input sequences (see Section III-B) and a running-sumadder (see Section III-C).

A. Demonstration Processor

The instruction set and datapath for the demonstration pro-cessor are depicted in Figs. 3 and 4, respectively. The processorutilizes two general purpose registers A and B, a four-function5

arithmetic logic unit (ALU), a two-word internal “scratchpad”memory M=M0M1 that also functions as an output buffer, fourmultiplexors for signal routing, and input and output buffers.Not shown are a 16-word instruction memory, a 4-bit programcounter (PC)—with a hardware provision for halting the counterif and when the instruction at address PC(1111) is reached—and control logic that decodes each instruction and generates allcontrol signals required to appropriately configure the datapath,perform register operations, adjust the program counter on jump

5The four supported functions are arithmetic add, bitwise add (i.e., bitwiseXOR), pass A, and pass B.

Fig. 5. State diagram for a transition detector FSA (top) and program forimplementing this FSA on the demonstration processor of this work (bottom).The least and most significant bits in the state encoding correspond to the“current input” and “transition flag” referenced in the code, respectively.

instructions, and enable memory access and I/O. Single-pass orinfinitely looping programs of up to 16 instructions in length canbe accommodated. Despite the simplicity of this demonstrationprocessor, it provides a concrete example of a programmable,general purpose processor appropriate for the illustrative studiesof this work.

B. Illustrative Example: Binary Sequence Transition Detector

We now consider implementation of a simple transition de-tection FSA, which detects bit transitions in a binary sequence,on the demonstration processor. The state diagram for the tran-sition detector FSA is shown in Fig. 5 (top). On each step, theFSA reads a new input bit, compares this “current input” bit tothe previous input bit, and outputs a transient “1” if the two bitsdiffer. Note that the states have been encoded so the least signif-icant state bit is the “current input” bit and the most significantstate bit is a “transition flag” that is set when the current inputrepresents a transition. The transition flag is the output of theFSA.

The transition detection FSA can be implemented onthe demonstration processor via infinite looping of the six-instruction sequence shown in Fig. 5 (bottom), with values ofthe 4-bit input IN restricted to 0000 and 0001 (representing FSAinput values 0 and 1, respectively). On the nth loop, the pro-cessor loads the nth input xn into B (in b) and generates thetransition flag fn = xn ⊕ xn−1 by bitwise addition (XOR) ofxn with the previous input xn−1 already residing in register A(add 1), writing fn to M0 (swa 0), and routing fn to the outputOUT (out 0). The current input xn is then saved to A (cpy 1)for use as the previous input for the (n + 1)th cycle, which isinitiated in the final instruction of the sequence (jmp 0). Theregister and memory contents for this sequence of instructionsare summarized for the nth loop in Fig. 6.

We now obtain lower bounds on the energy dissipation perloop for the demonstration processor implementing the se-quence detection FSA on a Bernoulli input sequence with char-acteristic probability p (where p = Prob{xn = 1}∀n). Thesebounds are obtained exclusively from irreversible loss of input

Page 4: Toward Nanoprocessor Thermodynamics

ANDERSON et al.: TOWARD NANOPROCESSOR THERMODYNAMICS 905

Fig. 6. Register and memory contents for the nth loop of the programmeddemonstration processor implementing the transition detection FSA.

Fig. 7. Energy dissipation bounds obtained from instruction- and architecture-level analyses (ILA and ALA) for the demonstration processor programmed todetect transitions in a Bernoulli sequence.

information from registers A and B and internal memory lo-cation M0 , since M1 is unused and the sequence of programcounter states is deterministic and logically reversible, and themapping from instructions to control signals is logically re-versible.6 The contents of A, B, and M0 are all constrained tovalues 0000 and 0001 by the inputs and the program.

Instruction-Level Analysis: In the ILA, the processor state el-ements comprisingA are regarded as a single systemA =ABM0and the referent R is taken to hold a physical instantiation ofthe input string with xn held in R1 . In the nth loop, A gainsinformation about R1 (i.e., xn ) during execution of the in binstruction, but without any loss of information about R fromA. (xn−1 remains in A even though it is overwritten by xn in B.)In fact, the only loss of input information in the entire programloop occurs during execution of the swa instruction, where

H2(p) = −p log2 p − (1 − p) log2(1 − p) (2)

bits of information about the input string (reflected in fn−1)is unconditionally erased from A. The resulting erasure cost,plotted as a function of the characteristic probability p in Fig. 7(ILA), is simply

Δ〈EB〉 ≥ kB T ln(2)H2(p). (3)

6The latter two simplifications would not be possible for programs involvingimmediate and conditional branch instructions and processors that support suchinstructions, as discussed in Section IV.

Interestingly, the ILA dissipation bound for the programmedprocessor is identical to the dissipation bound we obtain forthe transition detector FSA from an extension of our referentialapproach to FSAs [11]. This bound reflects only the essentialphysical cost of irreversible information loss inherent in thetransition detection task.

Architecture-Level Analysis: In the ALA the state elementsA, B, and M0 are considered pairwise, since no more than one ofthese elements changes state upon execution of any instructionand no such state change is conditioned on more than one otherprocessor state element or external referent. The registers and/ormemory elements assigned the roles of referent R1 and artifactA in the dissipation analysis of a given program step thus de-pend on the instruction being executed. These assignments andthe associated conditioning of state transitions are determinedby configuration of the datapath on each step, placing realistic,architecture-dependent restrictions on the conditioning of pro-cessor state transitions in evaluation of ALA dissipation bounds.It is in this sense that bounds obtained from an ALA analysisexplicitly capture irreversibility incurred in state transitions asthey are implemented by the processor.

Step-by-step ALA analysis of the programmed demonstrationprocessor proceeds as follows:

1) in b: The state of B (representing the previous input xn−1)is overwritten by unconditionally bringing B into correla-tion with the (statistically independent) state of an exter-nal referent R1 (representing the current input xn ). Theamount of energy dissipated into the bath by this uncondi-tional overwrite, resulting from local loss of informationabout the input string R from B, is7

Δ〈EB〉in b ≥ kB T ln(2)H2(p). (4)

2) add 1: The state of A (representing xn ) is conditionallytransformed in a manner that depends on the (statisticallyindependent) state of B (representing xn−1), with A leftunchanged if B=0000 and inverted if B=0001. While theXOR operation is logically irreversible, it is implementedhere as a bitwise reversible CNOT operation with B= R1serving as the control bit that selects between one of tworeversible operations to be applied to A (identity or NOT).As a result, there is no fundamental lower bound on thedissipative cost of implementing this instruction.

3) swa 0: The state of M0 (representing fn−1) is overwrittenby the (statistically correlated) state of A (representing fn ).By inequality (1), with A=M0 and R1=A, the amount ofenergy dissipated into the bath is lower bounded as

Δ〈EB〉swa 0 ≥ kB T ln(2)[2p(1 − p) + (1 − 2p(1 − p)) ·

H2

(p(1 − p)

1 − 2p(1 − p)

)](5)

where H2(·) is the binary entropy function defined in (2).4) out 0: A, B, and M0 are unchanged this step.5) cpy 1: The state of A (representing fn ) is overwritten by

the (statistically correlated) state of B (representing xn ).

7This overwriting scenario corresponds to Case 4 of Table I in [7].

Page 5: Toward Nanoprocessor Thermodynamics

906 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2013

Fig. 8. Cumulative energy dissipation bounds for one loop of the demon-stration processor programmed to implement the transition detection FSA on aBernoulli input sequence with characteristic probability p = 0.5.

By inequality (1), with A = A and R1 = B, the amountof energy dissipated into the bath is

Δ〈EB〉cpy 1 ≥ kB T ln(2)H2(p). (6)

6) jmp 0: A, B, and M0 are unchanged in this step.Three of the program steps—in b, swa 0, and cpy 1—are

found to be necessarily dissipative in the ALA analysis, wherethe actual datapath configurations established to execute instruc-tions and drive processor state transformations are explicitlyreflected.

The ALA bound on the cumulative dissipation within oneprogram loop, obtained from the single-step bounds (4)–(6)above, is shown for characteristic input probability p = 0.5 inFig. 8 (ALA), along with the analogous result from the ILAanalysis for comparison (ILA). The ALA bound on the totalenergy Δ〈EB〉 dissipated to the bath per program loop, obtainedby summing the single-step bounds on Δ〈EB〉in b, Δ〈EB〉swa 0,and Δ〈EB〉cpy 1 obtained above, is

Δ〈EB〉 ≥ kB T ln(2)[2H2(p) + 2p(1 − p)

+ (1 − 2p(1 − p)) H2

(p(1 − p)

1 − 2p(1 − p)

)]. (7)

This bound is plotted as a function of the characteristic proba-bility p in Fig. 7 (ALA). The ALA bound clearly exceeds ILAbound for all nontrivial cases in Figs. 7 and 8. The differencesreflect excess dissipation—dissipation above and beyond thatrequired by physical law for realization of the desired sequenceof processor states—that is necessarily incurred by the processoras it generates the required state transitions via the idiosyncratic(and generally non-optimal) computational strategy “native” toits underlying architecture.

C. Illustrative Example: Running-Sum Adder

Next, we consider implementation of a simple adder thattallies a running sum of a sequence of 4-bit inputs. The adder isimplemented on the demonstration processor via infinite loopingof the five-instruction sequence shown in Fig. 9 (top). On thenth loop, the processor loads the nth input xn into B (in b),and generates the running sum sn = xn + sn−1 by arithmeticaddition of xn and the stored running sum sn−1 already residingin register A (add 0), writing the updated running sum sn to M0

Fig. 9. Program for implementing the running sum adder on the demonstrationprocessor (top), and the processor register and memory contents for the nth loopof this program (bottom).

(swa 0), routing sn to the output OUT (out 0), and initiatingthe (n + 1)th loop (jmp 0). The register and memory contentsfor this sequence of instructions are summarized for the nthloop in Fig. 9 (bottom). The ILA and ALA dissipation analyses,carried out along the lines of the transition detector analyses ofSection III-B, are outlined as follows.

Instruction-Level Analysis: In the ILA of the running-sumadder, the only dissipative step is the in b instruction. In thisstep, all information about R (i.e., about xn−1) in A is lost asB is unconditionally overwritten by R1 (i.e., xn ). The resultingamount of irreversible information loss is

H({pi}) = −∑

i

pi log2 pi (8)

and the corresponding dissipative cost is

Δ〈EB〉 ≥ kB T ln(2)H({pi}) (9)

where pi is the probability of the ith 4-bit input. The ILA dis-sipation bound is greater than the dissipation bound we obtainfor a 16-input, 16-state FSA implementation of a running-sumadder. This is expected since the FSA states represent only thevalues of the running sum, with no physical encoding—and noirreversible overwriting—of the inputs that induce transitionsbetween states (as is the case in the processor).

Architecture-Level Analysis: For the running-sum adder, thestep-by-step ALA analysis proceeds as follows:

1) in b: The state of B (xn−1) is unconditionally overwrittenby the (statistically independent) state of the external ref-erent R1 (xn ). The amount of energy dissipated into thebath by this unconditional overwrite, resulting from localloss of information about the input string R from B, is

Δ〈EB〉in b ≥ kB T ln(2)H({pi}). (10)

2) add 0: The state of A (sn−1) is conditionally transformedto sn in a manner that depends on the state of B (xn ),which is initially not correlated with A. The arithmeticadd operation is implemented as a transformation ofA conditioned entirely on the state of B = R1 . Thistransformation is reversible, so there is no fundamental

Page 6: Toward Nanoprocessor Thermodynamics

ANDERSON et al.: TOWARD NANOPROCESSOR THERMODYNAMICS 907

Fig. 10. Cumulative energy dissipation bounds for one loop of the demonstra-tion processor programmed to implement the running-sum adder. A uniformlydistributed input was assumed.

lower bound on the dissipative cost of implementing thisstep.

3) swa 0: The state of M0 (sn−1) is conditionally transformedin such a manner that it is overwritten by the state of A (sn ).By inequality (1), with A =M0 and R1 =A, the amountof energy dissipated into the bath is lower bounded as

Δ〈EB〉swa 0 ≥ kB T ln(2)H({pi}). (11)

4) out 0: A, B, and M0 are unchanged this step.5) jmp 0: A, B, and M0 are unchanged in this step.Two of the program steps—in b and swa 0—are found to be

necessarily dissipative in the ALA analysis of the running-sumadder. The ALA bound on the cumulative dissipation Δ〈EB〉within one program loop, obtained by summing the bounds onΔ〈EB〉in b and Δ〈EB〉swa 0 evaluated above, is thus

Δ〈EB〉 ≥ 2kB T ln(2)H({pi}). (12)

The ALA bound on the cumulative dissipation within one pro-gram loop, obtained for a uniformly distributed input, is shownin Fig. 10 along with the corresponding ILA bound.

We finally note that for an input probability p1 = Prob{xn =0001} = 1 ∀n, i.e., for an input string that is all 0001’s, therunning sum adder functions as a simple up counter. The ILAand ALA dissipation bounds vanish in this case, since no infor-mation is lost in the in b step (where xn−1 = 0001 is always“overwritten” by the same value with certainty) or in the swa 0step (where sn−1 can be reversibly overwritten by sn since suc-cessive values of the running sum completely determine oneanother). These results are consistent with the vanishing boundwe obtain for the up-counter FSA, which is also reversible.

IV. DISCUSSION

While the rudimentary processor and programs ofSection III enabled concrete demonstration of the instruction-and architecture-level dissipation analyses, full developmentand application of our methodology to nanoprocessors of use-ful complexity will present several challenges. We identify andbriefly discuss three of these challenges in the following.

First, the physical-information-theoretic formalism that un-derlies our approach will have to be extended to enable descrip-tion of processors that support more complex instruction andprogram structures. Treatment of the demonstration processor

of Section III was simplified considerably by the lack of im-mediate and conditional branch instructions. With embeddingof input data in immediate instructions, and conditioning of PCstate transitions by conditional branches, statistical descriptionsof the contents of the instruction memory and state of the PCwill be required. Dependence of the processor state transitionson the instruction memory contents and program counter statewill also require a more complex and explicit treatment than wasrequired for our illustrative example. The required extensions,currently under investigation, will enable exploration of addi-tional sources of dissipation that arise from execution of theseinstruction types and the program structures that they enable.

Second, to complete the analysis hierarchy, the circuit-leveltier will have to be fully integrated with the upper two tiers.In previous work [2]–[4], we have constructed information-theoretic descriptions of nanocomputing circuits with resolutiondown to the binary switch/cell level and have used the referentialapproach to obtain bounds on heat dissipation in these circuitsthat arises from local logical irreversibility, input/output opera-tions, and other sources.8 If such descriptions are to properly “fillin the blanks” of the ALA, which describe circuit blocks onlyimplicitly in terms of their bare functionality, strategies must bedeveloped for combining information-theoretic descriptions ofmultiple interacting circuit blocks to obtain bounds with circuit-level resolution that capture the system-level functionality ofarchitecture-level analyses. This will allow sources of dissipa-tion arising at the circuit level to be identified and properlydistinguished from those arising at the architecture level.

Third, scaling strategies must be developed if our method-ology is to provide dissipation bounds for nanoprocessors ofmeaningful complexity. This will require the modular, mul-tilevel physical-information-theoretic description of operatingprocessors required for full integration of our methodology dis-cussed above, as well as scalability at all three levels. Scalabilityat the instruction and architecture levels will require scalablestatistical description of program flow within our approach, thepossibilities for which will be suggested by progress in the firstchallenge discussed above. Scalability at the circuit level willbe most tractable for regular circuit topologies, where analyticaldissipation bounds may be obtainable even for large circuits, butdecomposition strategies and/or automation may be required forlarge circuits with less regular structures [13]. For the purposesof some exploratory analyses, abstract circuit blocks character-ized only by their functionality, memory capacity, and latencymay be adequate surrogates for more detailed descriptions thattake full account of circuit structure and operation.

V. CONCLUSION

In this paper, we have outlined a methodology for obtaininghierarchies of dissipation bounds for nanoprocessors describedphysically at the instruction, architecture, and circuit levels. Wealso illustrated application of this methodology at the instruction

8These studies suggest that contributions to dissipation that belong whollyto the circuit level—dissipation that results solely from logical irreversibility inexcess of what is physically required to implement the desired operation—canbe significant even in perfect circuits with no parasitic energy losses.

Page 7: Toward Nanoprocessor Thermodynamics

908 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2013

and architecture levels for a simple 4-bit demonstration pro-cessor executing sequences of operations on input strings anddiscussed challenges associated with full development and ap-plication of the methodology to more complex processor archi-tectures and program structures.

Studies based on this physical-information-theoretic method-ology could yield fundamental and practical insights intosources and consequences of irreversibility in various nanopro-cessor architectures. Comparison of bounds obtained at the in-struction, architecture, and circuit levels for a given processorcould help to identify architectural inefficiencies and suggestimprovements, and circuit-level bounds could be used to obtainfundamental limits on processor performance and density forspecified assumptions regarding heat removal. Comparison ofbounds obtained at all levels for different processors could pro-vide fundamental insight into their respective energy budgets,revealing inherent differences in tradeoffs between computing,communication, and control costs for various nanocomputingparadigms. Such insights will be particularly valuable for eval-uating proposals for nanoprocessors designed to provide highperformance at extremely low power, where fundamental limitson dissipation from irreversible information loss may imposepractical limits on circuit density and processor performance.

APPENDIX

Proof of (1): Consider unitary evolution of a tripartitie systemR1AB that transforms an initial state of the form

ρR1 AB =∑

i

p(1)i

(πR1

i ⊗ ρAi

)⊗ ρBth

into some final state

ρR1 AB ′=

∑i

p(1)i

(πR1

i ⊗ ρAB ′

i

)

where

ρBth = Z−1 exp(−HB/kB T

)is a thermal state of B characterized by temperature T ,Z = Tr[exp(−HB/kB T )], and HB is the bath Hamiltonian.The von Neumann entropy of the (separable) initial state is

S(ρR1 AB ) = S(ρR1 A) + S(ρBth).

and, by the joint entropy theorem, the von Neumann entropiesρR1 A and of the global final state ρR1 AB ′

can be written as

S(ρR1 A) = S(ρR1 ) +∑

i

p(1)i S(ρAi )

S(ρR1 AB ′) = S(ρR1 ) +

∑i

p(1)i S(ρAB ′

i ).

Since unitary evolution preserves global von Neumann entropy,S(ρR1 AB ) = S(ρR1 AB ′

) and we have∑i

p(1)i S(ρAi ) + ρBth =

∑i

p(1)i S(ρAB ′

i ).

By the subadditivity and concavity of the von Neumann entropy,we also have

∑i

p(1)i S(ρAB ′

i ) ≤∑

i

p(1)i S(ρA

i ) +∑

i

p(1)i S(ρB

i )

≤∑

i

p(1)i S(ρA

i ) + S(ρB′)

which, with the above, yields the inequality

S(ρB′) − S(ρBth) ≥

∑i

p(1)i

[S(ρAi ) − S(ρA

i )]. (13)

A corresponding bound on the change in the bath energy canbe obtained from this entropy bound and a fundamental resultfrom quantum thermodynamics due to Partovi [12]: for jointunitary evolution of a (generally quantum mechanical) environ-ment B—initially in a thermal state ρBth at temperature T—andanother system with which B interacts (here R1S), the expectedenergy of B increases by an amount no less than

Δ〈EB〉 ≥ kB T ln(2){

S(ρB′) − S(ρBth)

}. (14)

Combining (13) and (14), we obtain inequality (1):

Δ〈EB〉 ≥ kB T ln(2)

{∑i

p(1)i

[S(ρAi ) − S(ρA

i )]}

.

REFERENCES

[1] R. Landauer, “Irreversibility and heat generation in the computing pro-cess,” IBM J. Res. Dev., vol. 5, pp. 183–191, 1961.

[2] I. Ercan and N. G. Anderson, “Heat dissipation bounds for nanocomput-ing: Theory and application to QCA,” in Proc. 11th IEEE Conf. Nanotech-nol., Aug. 2011, pp. 1289–1294.

[3] I. Ercan, M. Rahman, and N. G. Anderson, “Determining fundamentallower bounds on heat dissipation for transistor-based nanocomputingparadigms,” in Proc. 2011 IEEE/ACM Int. Symp. Nanoscale Archit., Jun.2011, pp. 169–174.

[4] I. Ercan and N. G. Anderson, “Heat dissipation in nanocomputing: Lowerbounds from physical information theory,” to be published.

[5] N. G. Anderson, “Information erasure in quantum systems,” Phys. Lett. Avol. 372, pp. 5552–5555, 2008.

[6] N. G. Anderson, “On the physical implementation of logical transforma-tions: Generalized l-machines,” Theor. Comput. Sci., vol. 411, pp. 4179–4199, 2010.

[7] N. G. Anderson, “Overwriting information: Correlations, physical costs,and environment models,” Phys. Lett. A, vol. 376, pp. 1426–1433, 2012.

[8] K. Walus, M. Mazur, G. Schulhof, and G. A. Jullien, “Simple 4-bit pro-cessor based on quantum-dot cellular automata,” in Proc. 16th IEEE Int.Conf. Appl.-Spec. Syst. Archit. Process., Jul. 2005, pp. 288–293.

[9] T. Wang, M. Ben-Naser, Y. Guo, and C.A. Moritz, “Wire-streaming pro-cessors on 2-D nanowire fabrics,” presented at NTSI 2005. [Online]. Avail-able: http://www.ecs.umass.edu/ece/ssa/papers/extNSTI.pdf

[10] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.Tech. J., vol. 27, pp. 379–423, 623–656, 1948.

[11] N. Ganesh and N. G. Anderson, “Irreversibility and dissipation in finitestate automata,” in preparation.

[12] M. H. Partovi, “Quantum thermodynamics,” Phys. Lett. A, vol. 137,pp. 440–444, 1989.

[13] I. Ercan, N. Ganesh, and N. G. Anderson, “Modular dissipation analy-sis for QCA,” presented at FCN’13: The 2013 Worksh. Field-CoupledNanocomput., Tampa, FL, USA, Feb. 2013, in preparation.

Page 8: Toward Nanoprocessor Thermodynamics

ANDERSON et al.: TOWARD NANOPROCESSOR THERMODYNAMICS 909

Neal G. Anderson (M’80–SM’04) received thePh.D. degree in electrical engineering from NorthCarolina State University, Raleigh, USA, in 1988.

He is currently with the Faculty of the Electricaland Computer Engineering Department, Universityof Massachusetts Amherst, Amherst, USA. His re-search and teaching activities at the University ofMassachusetts Amherst have emphasized various as-pects of physical electronics. His current research isfocused on physical information theory and its appli-cation to nanocomputing, including quantum physi-

cal models for classical information processing in nanosystems, computationalefficacy measures, heat dissipation bounds, and fundamental physical limits.

Dr. Anderson is a member of the American Association for the Advance-ment of Science, the American Physical Society, the Optical Society of America,the Philosophy of Science Association, and Tau Beta Pi. He and his studentsreceived Best Paper Awards for work presented at the IEEE International Con-ference on Nanotechnology in 2011 and 2012. He received the DistinguishedTeaching Award at the University of Massachusetts Amherst in 2006.

Ilke Ercan (S’06) received the B.S. degree in physicswith a minor in philosophy and history of sci-ence from Middle East Technical University, Ankara,Turkey, in 2006, and the M.S.E.C.E. degree in elec-trical and computer engineering from the Universityof Massachusetts Amherst, Amherst, USA, in 2008.She is currently working toward the Ph.D. degreein the Electrical and Computer Engineering Depart-ment, University of Massachusetts Amherst, whereshe is also a Research Assistant in the Nanoelectron-ics Theory and Simulation Laboratory.

Natesh Ganesh received the B.Tech. degree in elec-tronics and communication engineering from the Na-tional Institute of Technology, Trichy, India, in 2009,and the M.S.E.C.E. degree with concentration insolid-state electronics from the University of Mas-sachusetts Amherst, Amherst, USA, in 2011. He iscurrently working toward the Ph.D. degree in theElectrical and Computer Engineering Department,University of Massachusetts Amherst, where he isalso a Research Assistant in the Nanoelectronics The-ory and Simulation Laboratory.