looking beyond cmos: integrated circuit design with nano ...scaled nems vs. cmos adders for similar...

26
Looking Beyond CMOS: Integrated Circuit Design with Nano Electro-Mechanical Switches Vladimir Stojanović (MIT) in collaboration with Tsu-Jae King Liu, Elad Alon (UC Berkeley) Dejan Marković (UCLA) MIEL 2010

Upload: others

Post on 27-Mar-2021

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Looking Beyond CMOS:

Integrated Circuit Design with

Nano Electro-Mechanical Switches

Vladimir Stojanović (MIT)

in collaboration with

Tsu-Jae King Liu, Elad Alon (UC Berkeley)

Dejan Marković (UCLA)

MIEL 2010

Page 2: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Acknowledgements

Circuit design

Fred Chen, Hossein Fariborzi

Matthew Spencer, Abhinav Gupta

Cheng Wang, Kevin Dwan

Device design

Hei Kam, Rhesa Nathanael, Vincent Pott, Jaeseok Jeon

Sponsors

DARPA NEMS program

FCRP (C2S2, MSD)

MIT CICS

Berkeley Wireless Research Center

NSF

2

Page 3: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

CMOS is Scaling, Power Density is Not

Since ~2000 supply voltage (Vdd) stuck at ~1V

Leakage stops you from lowering threshold (Vth)

Vdd and Vt not scaling well power/area not scaling

400480088080

8085

8086

286386

486Pentium® proc

P6

1

10

100

1000

10000

1970 1980 1990 2000 2010Year

Po

we

r D

en

sit

y (

W/cm

2)

Hot Plate

Nuclear Reactor

Rocket Nozzle

S. Borkar, Intel

Sun’s Surface

Power Density Prediction circa 2000

Core 2

3

Performance limited by power, not number of transistors

Page 4: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Parallelism to the Rescue

Parallelism allows slower, more efficient units

Helps scale performance for fixed power

Will this last forever?

4

Lower supply,

balanced Vth

Parallelize to recover performance

IBM CELL processor

4

Page 5: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Subthreshold leakage: Game Over for CMOS

Leakage and sub-threshold slope define minimum

energy/op for CMOS

Parallelism cannot reduce power if already operating at

minimum energy

5

Page 6: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

NEM Switches to the Rescue

NEM switches show zero leakage & sharp sub-threshold slope

Could potentially enable reduced E/op with scaling

Measured MEM Switch I-V Curve MEM Switch Energy vs. VDD

6

R. Nathanael et al., “4-Terminal Relay Technology for Complementary Logic,” IEDM 2009

Etotal=Edynamic

Page 7: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

NEM Switch Structure and Operation

ON Switch:

|Vgb| > Vpi (pull-in voltage)

OFF Switch:

|Vgb| < Vpo (pull-out voltage)

7

Poly-SiGe Anchor

Poly-SiGe Beam

/Flexure

Tungsten Body

Tungsten Source/Drain

Tungsten Channel Poly-SiGe Gate

Page 8: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

NEM Switch as a Logic Element

8

DrainSource Body

Gate

4-terminal design mimics MOSFET operation Electrostatic actuation is ambipolar

Non-inverting logic is possible Actuation independent of source/drain voltages

8

Page 9: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

NEM Switch Model

9

Spring k Damper bMass m

Cgb

G

S D

Rcs Rcd

B

Cgc

CdbCsb

Anchor

Mechanical

Model

Rs Rd

Lumped Verilog-A model for circuit design and simulation:

Mechanical dynamics: spring (k), damper (b), mass (m)

Electrical parasitics: non-linear gate-body (Cgb), gate-channel

(Cgc), and source/drain-body cap (Cs,db), contact (Rcs,d)

9

Page 10: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

NEM Switch Characteristics

1.E-07

1.E-06

0 5 10 15

tPI[s]

VDD [V]

Model

Experiment

L=50mm 40mm 14mm

Simple electro-mechanical model matches experiment well Mechanical “pull-in” delay scales with geometry and overdrive voltage

“Pull-in” voltage (threshold) scales with switch geometry

Constant E-field scaling Mechanical delay and VPI scale linearly

10

Model

*H. Kam et al., “Design and Reliability of a Micro-Relay Technology…,” IEDM 200910

Page 11: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Digital Circuit Design with NEMS

CMOS: delay set by electrical time constant

Quadratic delay penalty for stacking devices

Buffer & distribute logical/electrical effort over many stages

NEMS: delay dominated by mechanical movement

Can stack ~100-200 devices before td,elec ≈ td,mech

So, want all to switch simultaneously

Implement logic as a single complex gate

NEMS: 12 switches

11

Page 12: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Need to Compare at Block Level

Delay Comparison vs. CMOS

Single mechanical delay vs. several electrical gate delays

For reasonable load, NEMS delay unaffected by fan-out/fan-in

Area Comparison vs. CMOS

Larger individual devices

But often need fewer devices to implement same function

4 gate delays 1 mechanical delay

F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008

NEMS: 12 switches

12

Page 13: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Example: NEMS Adder

Full adder cell:

12 NEMS vs. 24

transistors

XOR “free”

Complementary signals

avoid extra mechanical

delay (to invert)

NEMS all sized minimally

13

Page 14: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

N-bit NEMS Adder

Ripple carry configuration

Cascade full adder cells to

create larger complex gate

Stack N NEMS, but still

single mechanical delay

Performance gap to

CMOS shrinks to ~10x for

32-bit add

(10ns delay at 90nm)

14

Page 15: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Scaled NEMS vs. CMOS Adders

For similar area: >9x lower E/op, >10x greater delay

*D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp.

on Computer Arithmetic (ARITH'07).

9x

10x

Energy/op vs. Delay/op across Vdd

30x less capacitance

Lower device Cg, Cd

Fewer devices

2.4x lower Vdd

No leakage energy

Compare vs.

Sklansky CMOS

adder*

F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008

15

Page 16: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Parallelism helps

16

Energy/op vs. Delay/op across Vdd & CL Can extend energy

benefit up to

GOP/s throughput

As long as

parallelism is

available

Area overhead

bounded

CMOS needs to be

parallelized at

some point too

16

Page 17: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Contact Resistance

Low contact R

not critical

Good news for

reliability…

17

Energy/op vs. Delay/op across Vdd & CL

Page 18: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

NEMS Circuit Demonstration

Test Devices

Logic

Adders,

Compressors

Timing

Elements

Latches, FFs

Memory

SRAM, DRAM

I/O

ADC, DAC

18

F. Chen et al., “Demonstration of Integrated Micro-Electro-Mechanical Switch Circuits for

VLSI Applications,” ISSCC 2010.

Page 19: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Things We Learned…

Layout Matters

Unbalanced current flow:

flexures burn up!

Parasitic capacitors: can

affect Vpi (DIBL-like effect)

19

Page 20: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Measured Inverter VTC

VTC looks digital, suggests composability

20

Page 21: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

NEMS Latch Shows Composability

Designed as if we used MOSFETS, but that is not always optimal…

21

Page 22: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Switch-based Carry Generation

Demonstrates propagate-generate-kill logic as a

single complex gate

22

Page 23: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Measured DRAM Results

Simultaneous read and write

Read latency = tmech,on (decoder) + tmech,off (WLRD switch)

23

Page 24: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Measured DAC Results

Resistive divider based DAC

2-bit thermometer coded output

Code = 0 0 “Vin”

Code = “Vin” 1 1

2424

Page 25: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

NEM switch scaling – next design

1um litho

Scaled NEMS size

20um x 20umNEM switch size

120um x 150um

0.25um litho

25

Page 26: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient

Conclusions

NEMS unique features enable scaling post-CMOS

Nearly ideal Ion/Ioff

Switching delay largely independent of electrical

Need to adapt circuit design style

Reliability improving

Circuit level insights critical (contact R)

Demonstrated simple circuits

Can start thinking about building more complex systems

Potentially order of magnitude lower E/op than CMOS

Next steps: scaling and improved device design, testing larger

digital blocks

26

t