looking beyond cmos: integrated circuit design with nano ...scaled nems vs. cmos adders for similar...
TRANSCRIPT
Looking Beyond CMOS:
Integrated Circuit Design with
Nano Electro-Mechanical Switches
Vladimir Stojanović (MIT)
in collaboration with
Tsu-Jae King Liu, Elad Alon (UC Berkeley)
Dejan Marković (UCLA)
MIEL 2010
Acknowledgements
Circuit design
Fred Chen, Hossein Fariborzi
Matthew Spencer, Abhinav Gupta
Cheng Wang, Kevin Dwan
Device design
Hei Kam, Rhesa Nathanael, Vincent Pott, Jaeseok Jeon
Sponsors
DARPA NEMS program
FCRP (C2S2, MSD)
MIT CICS
Berkeley Wireless Research Center
NSF
2
CMOS is Scaling, Power Density is Not
Since ~2000 supply voltage (Vdd) stuck at ~1V
Leakage stops you from lowering threshold (Vth)
Vdd and Vt not scaling well power/area not scaling
400480088080
8085
8086
286386
486Pentium® proc
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010Year
Po
we
r D
en
sit
y (
W/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
S. Borkar, Intel
Sun’s Surface
Power Density Prediction circa 2000
Core 2
3
Performance limited by power, not number of transistors
Parallelism to the Rescue
Parallelism allows slower, more efficient units
Helps scale performance for fixed power
Will this last forever?
4
Lower supply,
balanced Vth
Parallelize to recover performance
IBM CELL processor
4
Subthreshold leakage: Game Over for CMOS
Leakage and sub-threshold slope define minimum
energy/op for CMOS
Parallelism cannot reduce power if already operating at
minimum energy
5
NEM Switches to the Rescue
NEM switches show zero leakage & sharp sub-threshold slope
Could potentially enable reduced E/op with scaling
Measured MEM Switch I-V Curve MEM Switch Energy vs. VDD
6
R. Nathanael et al., “4-Terminal Relay Technology for Complementary Logic,” IEDM 2009
Etotal=Edynamic
NEM Switch Structure and Operation
ON Switch:
|Vgb| > Vpi (pull-in voltage)
OFF Switch:
|Vgb| < Vpo (pull-out voltage)
7
Poly-SiGe Anchor
Poly-SiGe Beam
/Flexure
Tungsten Body
Tungsten Source/Drain
Tungsten Channel Poly-SiGe Gate
NEM Switch as a Logic Element
8
DrainSource Body
Gate
4-terminal design mimics MOSFET operation Electrostatic actuation is ambipolar
Non-inverting logic is possible Actuation independent of source/drain voltages
8
NEM Switch Model
9
Spring k Damper bMass m
Cgb
G
S D
Rcs Rcd
B
Cgc
CdbCsb
Anchor
Mechanical
Model
Rs Rd
Lumped Verilog-A model for circuit design and simulation:
Mechanical dynamics: spring (k), damper (b), mass (m)
Electrical parasitics: non-linear gate-body (Cgb), gate-channel
(Cgc), and source/drain-body cap (Cs,db), contact (Rcs,d)
9
NEM Switch Characteristics
1.E-07
1.E-06
0 5 10 15
tPI[s]
VDD [V]
Model
Experiment
L=50mm 40mm 14mm
Simple electro-mechanical model matches experiment well Mechanical “pull-in” delay scales with geometry and overdrive voltage
“Pull-in” voltage (threshold) scales with switch geometry
Constant E-field scaling Mechanical delay and VPI scale linearly
10
Model
*H. Kam et al., “Design and Reliability of a Micro-Relay Technology…,” IEDM 200910
Digital Circuit Design with NEMS
CMOS: delay set by electrical time constant
Quadratic delay penalty for stacking devices
Buffer & distribute logical/electrical effort over many stages
NEMS: delay dominated by mechanical movement
Can stack ~100-200 devices before td,elec ≈ td,mech
So, want all to switch simultaneously
Implement logic as a single complex gate
NEMS: 12 switches
11
Need to Compare at Block Level
Delay Comparison vs. CMOS
Single mechanical delay vs. several electrical gate delays
For reasonable load, NEMS delay unaffected by fan-out/fan-in
Area Comparison vs. CMOS
Larger individual devices
But often need fewer devices to implement same function
4 gate delays 1 mechanical delay
F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008
NEMS: 12 switches
12
Example: NEMS Adder
Full adder cell:
12 NEMS vs. 24
transistors
XOR “free”
Complementary signals
avoid extra mechanical
delay (to invert)
NEMS all sized minimally
13
N-bit NEMS Adder
Ripple carry configuration
Cascade full adder cells to
create larger complex gate
Stack N NEMS, but still
single mechanical delay
Performance gap to
CMOS shrinks to ~10x for
32-bit add
(10ns delay at 90nm)
14
Scaled NEMS vs. CMOS Adders
For similar area: >9x lower E/op, >10x greater delay
*D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp.
on Computer Arithmetic (ARITH'07).
9x
10x
Energy/op vs. Delay/op across Vdd
30x less capacitance
Lower device Cg, Cd
Fewer devices
2.4x lower Vdd
No leakage energy
Compare vs.
Sklansky CMOS
adder*
F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008
15
Parallelism helps
16
Energy/op vs. Delay/op across Vdd & CL Can extend energy
benefit up to
GOP/s throughput
As long as
parallelism is
available
Area overhead
bounded
CMOS needs to be
parallelized at
some point too
16
Contact Resistance
Low contact R
not critical
Good news for
reliability…
17
Energy/op vs. Delay/op across Vdd & CL
NEMS Circuit Demonstration
Test Devices
Logic
Adders,
Compressors
Timing
Elements
Latches, FFs
Memory
SRAM, DRAM
I/O
ADC, DAC
18
F. Chen et al., “Demonstration of Integrated Micro-Electro-Mechanical Switch Circuits for
VLSI Applications,” ISSCC 2010.
Things We Learned…
Layout Matters
Unbalanced current flow:
flexures burn up!
Parasitic capacitors: can
affect Vpi (DIBL-like effect)
19
Measured Inverter VTC
VTC looks digital, suggests composability
20
NEMS Latch Shows Composability
Designed as if we used MOSFETS, but that is not always optimal…
21
Switch-based Carry Generation
Demonstrates propagate-generate-kill logic as a
single complex gate
22
Measured DRAM Results
Simultaneous read and write
Read latency = tmech,on (decoder) + tmech,off (WLRD switch)
23
Measured DAC Results
Resistive divider based DAC
2-bit thermometer coded output
Code = 0 0 “Vin”
Code = “Vin” 1 1
2424
NEM switch scaling – next design
1um litho
Scaled NEMS size
20um x 20umNEM switch size
120um x 150um
0.25um litho
25
Conclusions
NEMS unique features enable scaling post-CMOS
Nearly ideal Ion/Ioff
Switching delay largely independent of electrical
Need to adapt circuit design style
Reliability improving
Circuit level insights critical (contact R)
Demonstrated simple circuits
Can start thinking about building more complex systems
Potentially order of magnitude lower E/op than CMOS
Next steps: scaling and improved device design, testing larger
digital blocks
26
t