parallel processing and circuit design with nano-electro...

23
Parallel Processing and Circuit Design with Nano-Electro-Mechanical Relays Elad Alon 1 , Tsu-Jae King Liu 1 , Vladimir Stojanovic 2, Dejan Markovic 3 1 University of California, Berkeley 2 Massachusetts Institute of Technology 3 University of California, Los Angeles

Upload: others

Post on 25-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Parallel Processing and Circuit Design

    with Nano-Electro-Mechanical Relays

    Elad Alon1, Tsu-Jae King Liu1, Vladimir Stojanovic2, Dejan Markovic3

    1University of California, Berkeley2Massachusetts Institute of Technology

    3University of California, Los Angeles

  • 100

    101

    102

    103

    104

    0

    20

    40

    60

    80

    100

    No

    rma

    lize

    d E

    ne

    rgy

    /op

    1/throughput (ps/op)

    CMOS is Scaling, Power Density is Not

    Vdd and Vt not scaling well power/area not scaling

    2

    400480088080

    8085

    8086

    286386

    486Pentium® proc

    P6

    1

    10

    100

    1000

    10000

    1970 1980 1990 2000 2010Year

    Po

    we

    r D

    en

    sit

    y (

    W/cm

    2)

    Hot Plate

    Nuclear Reactor

    Rocket Nozzle

    S. Borkar

    Sun’s Surface

    Power Density Prediction circa 2000

  • CMOS is Scaling, Power Density is Not

    Vdd and Vt not scaling well power/area not scaling

    Parallelism to improve throughput within power budget

    3

    100

    101

    102

    103

    104

    0

    20

    40

    60

    80

    100

    No

    rmalized

    En

    erg

    y/o

    p

    1/throughput (ps/op)

    Operate at a

    lower energy

    point

    Run in parallel to

    recoup performance

    400480088080

    8085

    8086

    286386

    486Pentium® proc

    P6

    1

    10

    100

    1000

    10000

    1970 1980 1990 2000 2010Year

    Po

    we

    r D

    en

    sit

    y (

    W/cm

    2)

    Hot Plate

    Nuclear Reactor

    Rocket Nozzle

    S. Borkar

    Sun’s Surface

    Power Density Prediction circa 2000

    Core 2

  • 101

    102

    103

    104

    105

    0

    20

    40

    60

    80

    100

    No

    rma

    lize

    d E

    ne

    rgy

    /op

    1/throughput (ps/op)

    Where Parallelism Doesn’t Help

    4

    CMOS circuits have well-defined minimum energy

    Caused by leakage and finite sub-threshold swing

    Need to balance leakage and active energy

    Limits energy-efficiency, no matter how slow the

    circuit runs

    Energy/op vs. Vdd

    0.1 0.2 0.3 0.4 0.5

    5

    10

    15

    20

    25

    No

    rmalized

    En

    erg

    y/c

    ycle

    Vdd (V)

    Etotal

    Edynamic

    Eleak

    Energy/op vs. 1/throughput

    Scale Vdd & VT:

    Etot

    Edyn

    Eleak

  • What if There Was No Leakage?

    5

    No longer need to balance increasing component

    of energy as Vdd decreases

    Relay (mechanical switch) offers near infinite sub-

    threshold slope and no leakage current

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    34 35 36 37

    Measured relay I-V

    VGB (V)

    I DS

    (mA

    ) compliance limit

    VDS = 1V

    W x L = 2mm x 20mm

    H = g = 0.6mm

    Energy/op vs. Vdd

    0.1 0.2 0.3 0.4 0.5

    5

    10

    15

    20

    25

    No

    rmalized

    En

    erg

    y/c

    ycle

    Vdd (V)

    Etotal

    Edynamic

    Etot = Edyn

  • Outline

    If we could reliably build relays, would

    relay circuits offer superior energy-

    efficiency?

    NEM Relay Structure and Model

    Digital Circuit Design with NEM Relays

    Comparisons to CMOS

    6

  • 7

    Top View

    Cross-Section

    L

    W

    Body Electrode(under body

    dielectric)

    DrainSource

    Channel

    Contact Dimples

    Electrical

    Gate

    Anchor

    NEM Relay Structure & Operation

    Metal Channel Gate DielectricElectrical Gate

    Source Drain

    Bodyody

    Body Dielectric

    H

    HSource Drain

    Body

    Body Dielectric

    t gap

    Htdimple

    W

    Air gap

    Infinite off-resistance zero leakage

    Open Relay (“OFF”):

    |Vgb| < Vpo (pull-out voltage)

    Closed Relay (“ON”):

    |Vgb| > Vpi (pull-in voltage)

    On-resistance depends on materials, pressure (|Vgb|)

  • 8

    Preliminary Fabricated Relay

    SOURCE

    DRAIN

    GATE

    BODY

    Channel

    1mm

    Measured I-V

    Cantilever W,L,H = 2μm,20μm,200nm

    Actuation gap thickness = 400nm

  • NEM Relay Model

    Compact Verilog-A model for circuit simulation:

    Mechanical dynamics: spring (k), damper (b), mass (m)

    Electrical parasitics: non-linear gate-body (Cgb), gate-channel

    (Cgc), and source/drain-body cap (Csd_b), contact (Rcs,d)

    9

    0.5Rc

    RsdRsdCgb

    0.5Rc

    RcdRcs

    DS

    Cgc

    So DoCsd_bCsd_b

    B

    Go

    C

    Cs Cd

    Anchor

    Spring k Damper b

    Mechanical Model

    Mass m

  • NEM Relay Scaling

    10

    Scaling (e.g., constant-E) benefits similar to CMOS

    Pull-in Voltage with Beam Scaling

    Measured pull-in voltages scale linearly

    If overcome reliability issues & surface forces scale:

    {W,L,H,tgap} = {90nm,2.3um,90nm,10nm} Vpi = 200mV

    Mechanical delay also scales linearly (~10ns @90nm)

  • NEM Relay as a Logic Element

    11

    4-terminal design mimics MOSFET operation Electrostatic actuation is ambipolar

    Non-inverting logic possible: actuation independent of drain/source voltages

    Mechanical delay (~10ns) >> electrical t (~1ps) Can stack 200 devices (with 1kW on-resistance) before

    electrical delay is comparable to mechanical delay

  • Digital Circuit Design with Relays

    CMOS: delay set by electrical time constant

    Quadratic delay penalty for stacking devices

    Buffer & distribute logical/electrical effort over many stages

    Relays: delay dominated by mechanical movement

    Want all relays to switch simultaneously

    Implement logic as a single complex gate

    12

  • Digital Circuit Design with Relays

    Delay Comparison vs. CMOS

    Single mechanical delay vs. several electrical gate delays

    For reasonable load, relay delay unaffected by fan-out/fan-in

    Area Comparison vs. CMOS

    Larger individual devices

    Fewer devices needed to implement the same logic function

    13

    4 gate delays 1 mechanical delay

  • 101

    102

    103

    104

    105

    0

    20

    40

    60

    80

    100

    No

    rma

    lize

    d E

    ne

    rgy

    /op

    1/throughput (ps/op)

    CMOS vs. Relays: Digital Logic

    14

    Most processor components exhibit the energy vs.

    performance tradeoff of static CMOS

    Control, Datapath, Clock

    Adder energy performance tradeoff is representative

  • Relay-based Adder

    Full adder cell:

    12 relays vs. 24

    transistors

    XOR “free”

    Complementary

    signals avoid extra

    mechanical delay (to

    invert)

    Relays all sized

    minimally

    15

  • N-bit Relay-Based Adder

    Ripple carry

    configuration

    Cascade full adder

    cells to create

    larger complex

    gate

    Stack of N relays,

    but still single

    mechanical delay

    16

  • Snapshot: NEM Relays vs. CMOS

    32-bit adder CMOS Relay

    Supply

    Voltage

    0.5 V 0.32 - 0.9 V

    Load Cap

    per Output

    25 fF 25 fF

    Total Gate

    Cap

    4.0 pF 125 fF

    Area 600 mm2 480 mm2

    17

    Compare vs. Sklansky CMOS adder1

    For similar area: >9x lower E/op, >10x greater delay

    1D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp.

    on Computer Arithmetic (ARITH'07).

    9x

    10x

    Energy/op vs. Delay/op across Vdd

    Energy is for adder only

  • Energy-Delay Results: Contact R

    18

    Low contact R not

    critical

    Enables low force,

    hard contact

    Good news for

    reliability…

    More later

  • Energy-Delay Comparison

    19

    Energy/op vs. Delay/op across Vdd & CL

    30x cap. gain

    Lower device Cg, Cd

    Fewer devices

    2.4x Vdd gain

    No leakage energy

    Limited by Esurf

    Relays always more energy efficient

    >10x better, even in the GOPS range

  • Area vs. Throughput

    For high throughputs, relays require area overhead

    to achieve similar throughputs as CMOS

    Area overhead is bounded (max. 6x-25x) :

    CMOS needs parallelism to maintain Emin for throughputs

    over 1GOPS

    20

    0.5 1 1.5 2 2.5 3

    5

    10

    15

    20

    25

    30

    Throughput (GOPS)

    AR

    ela

    y/A

    CM

    OS

    W Relay (buffered, 25fF)

    W Relay (buffered, 100fF)

    Au Relay (25fF)

    Au Relay (100fF)

    Erelay = 20fJ/op

  • Revised Power Breakdown

    21

    I/O energy dominant if core is ~30x more efficient

    Need to explore processing of analog signals too…

    No or limited “linear gain” in relays – rely on switching1

    *F. Chen et. al., “Integrated Circuit Design with NEM Relays,” IEEE ICCAD, Nov. 2008.

  • Conclusions

    NEM relays offer unique characteristics

    Nearly ideal Ion/Ioff

    Significantly lower Cg than CMOS

    Switching delay largely independent of electrical t

    Relay circuits show potential for order of

    magnitude better energy efficiency

    Examined 32-bit adders

    Exploring memories, multipliers, complete

    microcontroller

    Key challenges are reliability and scaling

    Circuit level insights critical: engineer contact R

    22

  • Acknowledgements

    Hei Kam, Fred Chen, Hossein Fariborzi,

    Abhinav Gupta, Jaeseok Jeon, Rhesa

    Nathanael, Vincent Pott, Matthew Spencer,

    Cheng Wang

    Berkeley Wireless Research Center

    NSF

    DARPA

    FCRP (C2S2, MSD)

    Intel Foundation Graduate Fellowship

    23