feasible demonstration of ultra low power adiabatic for ... · entropy in a nutshell ......

15
6/2/2017 1 Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Feasible demonstration of ultralowpower adiabatic CMOS for cubesat applications using LC ladder resonators Michael Frank Sandia National Laboratories Tenth Workshop on FaultTolerant Spaceborne Computing Employing New Technologies Albuquerque, NM June 1, 2017 Approved for Unclassified Unlimited Release SAND2017-5650 C Abstract Small space platforms such as cubesats are typically highly constrained in the power available for onboard computation, limiting the scope of achievable missions. Unfortunately, conventional approaches to lowpower computing in CMOS are limited in their energy efficiency, because they still follow the conventional irreversible computing paradigm, in which digital signals are destructively overwritten on every clock cycle, dissipating the associated CV 2 signal energy to heat. In an alternative approach called reversible computing, which can be implemented in radhard CMOS, we can adiabatically transform digital signals from old states to new ones with almost no dissipation of signal energy, instead recovering almost all of the signal energy and reusing it in subsequent operations. At relatively low (MHz scale) frequencies, this approach can yield ordersofmagnitude gains in powerlimited parallel performance compared to more conventional approaches to lowpower CMOS. In this paper, we propose a feasible nearterm demonstration of reversible adiabatic CMOS at attojouleperoperation energy scales, using custom LC ladder resonators integrated inpackage with the logic IC to achieve highquality energy recovery.

Upload: buiphuc

Post on 27-May-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

6/2/2017

1

Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Feasible demonstration of ultra‐low‐power adiabatic CMOS for cubesat applications using LC 

ladder resonators

Michael FrankSandia National Laboratories

Tenth Workshop on Fault‐Tolerant Spaceborne Computing Employing New Technologies

Albuquerque, NMJune 1, 2017

Approved for Unclassified Unlimited ReleaseSAND2017-5650 C

AbstractSmall space platforms such as cubesats are typically highly constrained in the power available for on‐board computation, limiting the scope of achievable missions. Unfortunately, conventional approaches to low‐power computing in CMOS are limited in their energy efficiency, because they still follow the conventional irreversible computing paradigm, in which digital signals are destructively overwritten on every clock cycle, dissipating the associated CV2

signal energy to heat. In an alternative approach called reversible computing, which can be implemented in rad‐hard CMOS, we can adiabatically transform digital signals from old states to new ones with almost no dissipation of signal energy, instead recovering almost all of the signal energy and reusing it in subsequent operations. At relatively low (MHz scale) frequencies, this approach can yield orders‐of‐magnitude gains in power‐limited parallel performance compared to more conventional approaches to low‐power CMOS. In this paper, we propose a feasible near‐term demonstration of reversible adiabatic CMOS at attojoule‐per‐operation energy scales, using custom LC ladder resonators integrated in‐package with the logic IC to achieve high‐quality energy recovery.

6/2/2017

2

Talk Outline Motivation

More power‐efficient computing for small spacecraft (nanosats, etc.)

Background Practical limits of irreversible CMOS

Thermodynamic limits of computing—a short tutorial

Reversible computing

The only long‐term sustainable path forward!

Reversible computing in adiabatic CMOS Basic principles

Early proof‐of‐concept chips

2LAL (two‐level adiabatic logic)

LC ladder resonators

Conclusion Towards a demonstration of ultra‐low‐power reversible CMOS

3

Motivation:  Energy Efficiencyfor Onboard Computing in Spacecraft Power efficiency of high‐bandwidth downlinks is limited by 

fundamental communication theory considerations… Majority of downlink power misses the receiver and is wasted

Thus, it would be desirable to do more processing onboard, if we can find ways to do this within a given power budget… Allows us to save available downlink bandwidth to convey numerous 

compactly‐encoded, higher‐level, mission‐relevant results extracted from raw sensor data by onboard processing This would then allow us to expand the scope of achievable missions

Also, even for a fixed mission, if the power requirements for the desired computation can be reduced, this could potentially allow the size of the entire spacecraft to be scaled down… Power supplies, solar panels/radiators, chassis geometry can be scaled

Mass of entire spacecraft and fuel requirements can be scaled

Total construction and launch costs can be substantially reduced

4

6/2/2017

3

~1 keV= 40,000

kT

Energy limits for conventional technology are not far away!

Thermal noise in min.‐width FET gates leads to channel fluctuations below ~1‐2 eV Increases leakage, impairs 

device performance

Note: Real transistors are often sized much wider than minimum width, for speed E.g., ~20×min. width Also there is fanout,

wire capacitance, etc.

Note: ITRS is aware of the thermal noise issue, and so has minimum gate energy asymptoting to ~2 eV Node energy follows, 

asymptoting to ~1 keV

Practical conventional circuit architectures can’t just magically cross this gap! Fundamental thermal 

limits translate to much larger practical limits!

~40 kT

4

1-2 eV =40-80 kT

En

erg

y (e

V)

Only reversible computing can take us from the end ofthe CMOS roadmap all the way down to and below!

1 fJ

1 aJ

Entropy in a Nutshell Define the “surprisingness” or surprise of any event  that 

has a 1 in  chance of occurring as  log . Call the  1 “improbability;” it can be a non‐integer.

is log because the improbabilities of independent surprises multiply.

Indefinite logarithm; dimensioned in arbitrary logarithmic units. Some example units:  log 2 1bit;  log e 1nat ;  log 10 1bel.

In terms of event’s probability 1/ ,

log1

log .

Define event’s “heaviness” (Hopefulness? Horribleness?) as its surprise, weighted by its probability:

/ ⋅ log log . Then for any probability distribution  over any mutually exclusive and 

exhaustive set of events  , … , , we have that the expected surpriseE and the total heaviness ∑ ∈ associated with 

that particular set of possible events are the same, and are given by: 

⋅∈

⋅ log∈

.

We call this quantity  the entropy of the given epistemological situation. By convention, we’ll prefer  for “computational” entropy,  for “physical” entropy.

6

Improbability 62 36

Surprise2 log 6

Heaviness

log 6

Basic review + coiningsome useful terminology

6/2/2017

4

Surprise and Heaviness Functions For an individual state’s

contribution to entropy.

7

Probability of event

Sur

pris

e log

of e

vent

(in

Heaviness of event

Probability of event

Hea

vine

ss

log

in Max. heaviness of when

Thermodynamics and Information Physical entropy quantifies uncertainty about 

the detailed microstate of a physical system. First postulated by Boltzmann (in his H‐theorem)

Integral to modern physics (Von Neumann entropy)

Depends on modeler’s state of knowledge (Jaynes)

The reversibility (injectivity) of microphysics underlies the Second Law of Thermodynamics. States cannot merge as they evolve…

Thus, entropy of a closed system cannot decrease! Conserved by unitary quantum time‐evolution.

Entropy can increase if we have any uncertainty about the dynamics, or do not track it in detail

At the most fundamental level, physical information cannot be destroyed. Only reversibly transformed, and/or transferred 

between different subsystems…

8

Bijective microphysics No “true” entropy change

True dynamics uncertain(or not tracked in detail) Entropy increases

1.03 1.03

1.03 1.29

0.69 0

Irreversible microphysics Entropy would decrease(Second Law of Thermo.

would be violated)

.2

.3

.5

.5

.5

.2

.3

.5

1

.2.1

.3.25

.5.4

.25

E log

6/2/2017

5

From Physics to Computation Thermodynamics and quantum mechanics 

show that any bounded physical system admits only a finite set Φ ,… , of measurably distinguishable detailed physical states (microstates). E.g., Φ could be any orthogonal basis of the 

system’s Hilbert space.

We can group or partition these microstates into subsets  of microstates that we consider equivalent to each other for some designated purpose…  e.g., representing some specific computational 

information

Any probability distribution  over the physical state space Φ induces a probability distribution over the computational state space (subsystem)  as well…

.

This implies a computational entropy .

9

Example of a computational state space consisting of 3distinct computational states, , ,each defined as a set

of equivalent physical states.

Visualizing Entropy of Grouped States Can represent a hierarchy of events in a tree structure…

Branch thickness = event probability  . Branch length = incremental surprise Δ associated w. event, 

relative to whatever base event it’s branching off from.

Branch area = event’s incremental heaviness ∆ Δ contribution to total entropy, in addition to base event.

Grouping events into larger events has these effects: Thicknesses (probs.) of branches combine in parent branch An corresponding part of total length (surprise) of each 

branch is reassociated to parent (stem) branch. Note: The total heaviness  of all branches and stems (total 

entropy S) is not affected at all by any grouping/ungrouping!!

10

|

Grouping

Total system entropy = computational entropy + non-computational entropy

Ungrouping

6/2/2017

6

Grouping of States (slide 1 of 3)

11

1.498

1(null event)

,,

,

0.111

0.333

0.222

0.083

2.484

2.197

1.099

1.504

0.25

1.386

0.111

Grouping of States (slide 2 of 3)

12

0.333

0.25

0.222

0.083

1.099

2.484

2.197

1.099

1.386

1.504

13

0.333

1(null event)

,,

,

0.405

| 0.75

Δ | 0.288

1.498

23

0.667

| | 0.862

Δ | 1.386

| 0.25

| 0.167

Δ | 1.792

| 0.333

Δ | 1.099

Δ | 0.693

| 0.5

|

⋅ log ⋅ log log |

6/2/2017

7

Grouping of States (slide 3 of 3)

13

1.099

13

0.333

1(null event)

,,

,

0.405

| 0.25

1.498

23

0.667

| 0.167

| 0.333

| 0.5

0.111

0.333

0.25

0.222

0.083

2.484

2.197

1.099

1.386

1.504

0.637

| | 0.862

Total system entropy = computational entropy + non-computational entropy

|

| 0.75

Δ | 0.288

Δ | 1.386

Δ | 1.792

Δ | 1.099

Δ | 0.693

Proof of Landauer’s Limit We’ve seen that the total system entropy  for a 

given closed system cannot decrease at all… So, what happens if we merge two computational states?  

Underlying probability distributions remain the same! Only the identities of the physical states  , and their 

groupings into computational states, can be changing

Merging two computational states implies, removing a conceptual partition between groups of physical states Same as the “ungrouping” operation we saw earlier

The computational contribution  to the total entropy  cannot simply vanish from existence…  Thus, it can only be ejected from the computational state 

into the non‐computational state 

We define non‐computational entropy as:.

So, the change in  from a merge operation is thus: Δ Δ Δ . 

To extent that “non‐computational” = “uncontrolled,”  the extra non‐computational entropy must end up in some 

thermal environment at some temperature  We must thus emit at least heat Δ Δ to that environment.   

If Δ 1b, then Δ ln 2.

14

Computationalsubsystem C before

bit erasure

Computationalsubsystem afterbit erasure

1.03 1.03

.2

.3

.5

.2

.3

.5

Unitary evolution conserves total system entropy!

Landauer Limit: ln 2 per bit lost. ■

0.691bit

.1

.4

.3

.2

00bit

,0.59

,1.28

Δ Δ1bit 0.69

0

1

.1

.4

.3

.2

6/2/2017

8

Why Reversible Computing? Landauer’s Limit is absolutely unavoidable in any computing 

scheme based on constantly losing computational information e.g., by erasing it, or (equivalently) destructively overwriting it

Note: Conventional computers lose information all the time! Every active logic gate in a conventional design destructively overwrites its 

previous output on every clock cycle (e.g., billions of times per second)

Even worse, in practice, erasing a bit dissipates not just ln2, but the entire logic signal energy associated with that bit! This is still  10,000 even at the very end of the CMOS roadmap!

Unlikely to decrease much, given thermal noise and architectural overheads

The only sustainable path forward would be if we increasingly recover the signal energies used to encode old bits, and reusealmost all of that energy to register newly‐computed bits… But, due to Landauer’s principle, approaching complete energy recovery 

requires us to avoid merging of computational states (as on prev. slide)  Since that would lose information and its associated signal energy!

15

(engaged)

Reversible Computing inAdiabatic CMOS Circuits An approach researched since the mid‐1980s…

MF invented a new scheme in early 2000s (2LAL)

Here’s a simple example of a reversible copyoperation using a CMOS transmission gate  Semantics:  Copy  ⇒ , given  0 initially.

Reversible if precondition  0 is satisfied Boolean AND/OR simply use series/parallel T‐gates

The driving signal  is ramped gradually from logic level 0 → 1 over some transition time  … Energy dissipated is  / (to first order)…

= output node capacitance = logic swing voltage = resistance of charging path

Dissipation approaches 0 in the adiabatic limit… Low speed and low leakage through transistors

This approach could even get below the Landauer limit, given sufficiently low‐leakage transistors… Has been empirically demonstrated with resistors

16

0

1

(disenaged)

0(no data)

(driver)

(input)

(output)

(complementedinput)

, 1,0, 0

, 0,0, 0

, 1,1,

, 0,1,

Note: No merging ofcomputational states!

6/2/2017

9

6/2/2017

17

Reversible and/or Adiabatic Full‐Custom VLSI Chips Designed @ MIT, 1996‐1999

By Josie Ammer, Mike Frank, Nicole Love, Scott Rixner, and Carlin Vieriunder CS/AI lab members Tom Knight and Norm Margolus.

6/2/2017

18

Circuit Rules for Truly/Fully Adiabatic FET‐based Switching Avoid passing current through diodes!

Crossing the “diode drop” leads to an irreducible dissipation.

Follow a “dry switching” discipline (in the relay lingo): Never turn on a transistor when VDS ≠ 0. “No sparks!” Never turn off a transistor when IDS ≠ 0. “No squelches!”

Only exception:  If an alternate path for current is available.

Together these rules imply: The computational function of the circuit must be logically reversible

There is no way to erase digital information under these rules!

Transitions must be driven by a quasi‐trapezoidal waveform It must be generated resonantly, with high Q

Of course, leakage power must also be kept manageable. Because of this, the optimal design point will not necessarily use the 

smallest devices that can ever be manufactured! With adiabatics, we can actually achieve lower total dissipation per op (including leakage) and higher aggregate performance (at fixed power) if we back off to using somewhat larger, slower, older‐generation devices!

An important rule, that is neglected in almost all of the “adiabatic” circuitliterature!

6/2/2017

10

6/2/2017

2LAL: 2‐level Adiabatic Logic

Uses transmission gates,  symbolized as:

Basic buffer element:  cross‐coupled T‐gates:

needs 8 transistors to buffer 1 dual‐rail signalby 1 transition time (tick)

Only 4 timing signals 0‐3 areneeded.  Only 4 ticks per cycle: rises during ticks ≡ mod4 falls during ticks  ≡ 2 mod4

TN

TP

T

:

in

out

1

0

0 1 2 3 …Tick #

01

23

A pipelined fully-adiabatic logic family invented by MF at UF in 2000, implementable using ordinary CMOS transistors.

2

(implicitdual-railencodingeverywhere)

6/2/2017

2LAL Shift Register Structure

1‐tick delay per logic stage:

Logic pulse timing and signal propagation:

in@0

1

0

2

1

3

2

out@4

0

3

inN

inP

0 1 2 3 ... 0 1 2 3 ...

6/2/2017

11

6/2/2017 22

More Complex Logic Functions Non‐inverting multi‐input Boolean functions:

Can also complement inputs/outputs and use DeMorgan substitution

One way to do inverting functions in pipelined logic is to use a quad‐rail logic encoding: To invert, just

swap the rails!

Zero‐transistor“inverters.”

A0

B0

1

A1

(AB)1

A0 B0

1

(AB)1

AN

AP

AN

AP

A = 0 A = 1

AND gate (plus delayed A)

OR gate

6/2/2017 23

2LAL 8-stage circular shift register

6/2/2017

12

6/2/2017 24

Pulse propagation in 8‐stage circuit

6/2/2017

25

Simulation Results (Cadence/Spectre) Graph shows per‐FET power dissipation vs. frequency in an 8‐stage shift register.

At moderate freqs. (1 MHz), Reversible uses < 1/100th the power of irreversible!

At ultra‐low power levels (1 pW/transistor) Reversible is 100× faster than irreversible!

Minimum energy dissipation per nFET is < 1 electron volt! 500× lower dissipation than best irreversible CMOS! 500× higher computational energy efficiency!

Energy transferred per nFETper cycle is still on the order of 10 fJ (100 keV) So, energy recovery efficiency is on the order of 99.999%! Quality factor  100,000!

– Note this does not include any of the parasitic losses associated with power supply and clock distribution yet, though

1.E-14

1.E-13

1.E-12

1.E-11

1.E-10

1.E-09

1.E-08

1.E-07

1.E-06

1.E-05

1.E+031.E+041.E+051.E+061.E+071.E+081.E+09

Avera

ge p

ow

er

dis

sip

ati

on

per

nF

ET,

W

Frequency, Hz

Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL

StandardCMOS

Energy dissipated per nF

ET

per cycle

2LAL = Two-level adiabatic logic (invented at UF, ‘00)

6/2/2017

13

How to generate clock signals?

To achieve a large energy savings, they must be generated resonantly, with a high  factor. Parasitic losses in clock distribution network must be minimal.

The waveforms need to have this very nonstandard shape… Not sinusoidal or square‐wave, but trapezoidal.

Gradual rise/fall ramps, and flat horizontal wave tops/bottoms.

– Ramps do not have to be perfectly linear, but slope should be limited.

A few of the supply techniques that have been considered: Clipped sinusoidal (crystal or LC) oscillators

Transmission‐line resonators

Custom MEMS resonators (various geometries)

Each of these have issues, and are not close to practical yet Here, we propose an easier approach:  LC ladder networks.

26

Spectrum of Trapezoidal Wave Relative to mid‐level crossing, waveform is an odd function

Spectrum includes only odd harmonics  , 3 , 5 , …

Six‐component Fourier series expansion is shown below Maximum offset with 11 frequency cutoff is  1.7% of 

27

12

4 2sin

sin 33

sin 55

sin 77

sin 99

sin 1111

180°

6/2/2017

14

Ladder Resonator Structure

harmonic componentmode (n) frequency f amplitude Va inductance L capacitance C

1 230kHz 1000.00mV 691.98nH 691.98nF3 690kHz 111.11mV 230.66nH 230.66nF5 1150kHz -40.00mV 138.40nH 138.40nF7 1610kHz -20.41mV 98.85nH 98.85nF9 2070kHz 12.35mV 76.89nH 76.89nF

11 2530kHz 8.26mV 62.91nH 62.91nF

28

Small-signaltrapezoidal

driver(external)

Ladder Resonatorfor Odd Harmonics

Load

Example values:

1

1for 1.75V ↓

Can build trapezoidal resonator w. a ladder circuit made of parallel passive bandpass filters, each a sinusoidal LC resonator Each “rung” of ladder passes a different odd 

multiple of the fundamental clock frequency  Adjust  / ratio to obtain a target  value on 

each path, given parasitic  , values

Excite the circuit with a driving signal containing the right distribution of frequency component amplitudes Each frequency component gets amplified by 

the  value of its corresponding rung If all rungs are designed to the same target  , 

we can just use a trapezoidal driver

For high  , clock period must be long compared to the total parasitic  …

Max. possible  ⋅ ,

Design Plan for Demonstration Part

29

Select a CMOS fabrication process… Older‐generation processes are good, b/c low leakage and rad‐hard

Design a pipelined 2LAL circuit to implement the desired function. To the level of layout and parasitic extraction in the selected process…

Minimize the parasitic resistance and capacitance of clock dist. network.

Identify a target clock frequency that is low enough to obtain the desired energy reuse factor ( value) This determines the maximum power‐limited performance boost that can 

be achieved compared to conventional irreversible CMOS

Select a packaging methodology that allows discrete components to be placed as close to the die as possible Ideal:  Direct bonding of component leads to pads on chip surface

Again, minimize the parasitic resistance/capacitance of joins

Identify specific COTS inductor and capacitor components for ladder network that maximize the overall  obtained… Goal:  Demonstrate  values of 10‐100×.

Iteratively refine design as needed…

6/2/2017

15

Conclusion There is a need for greater energy efficiency in spacecraft

Could allow the entire vehicle to be scaled down considerably…

Or, afford greater mission scope within a given‐size platform

We can actually prove from fundamental physics that:  The only long‐term sustainable path to attain ever‐better energy 

efficiency in computing is to use reversible computing principles.

The CMOS roadmap will soon run out of steam,  and beginning to apply reversible computing principles now can offer 

near‐term benefits, that can be further extended in the future.

Reversible computing in truly/fully adiabatic CMOS is an approach that could be demonstrated in a short time‐frame… LC ladder resonators with die‐bonded inductors may be adequate to 

allow demonstrating 1‐2 orders of magnitude energy efficiency gains

Next step:  A detailed design and feasibility study showing the viability of such a demonstration would be highly desirable.

30