hasler iit lecture2 2009a

Upload: anjireddy-thatiparthy

Post on 02-Jun-2018

244 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Hasler IIT Lecture2 2009a

    1/30

  • 8/10/2019 Hasler IIT Lecture2 2009a

    2/30

    Power Efficient Computing

    Cortical Neurons 1000s of inputs,

    1000s of channel populations,

    one output

    Equivalent computation ~

    400MMAC / neuron(no learning / growth)

    ~ roughly 20pW / neuron

    ~ 500TMAC

    < 10000 neurons

    ~100kW (comp) with 4000 DSPs

    400MMAC / neuron at 20pW

    digital is quite far away (100mW)analog VMM closer (100W)

    analog HMM / dendrites get close

    Useful Analog must be

    Programmable / Configurable

    Custom Analog ~ 1000 10000

    more efficient than Custom Digital

    (Mead 1990)

    Portable Devices battery powered(or less)

    larger systems

    minimize battery size / weight

    Get as much computation

    as possible

    Analog (VMM): 10MMAC/ W Digital: 4 MMAC / mW (DSP)

  • 8/10/2019 Hasler IIT Lecture2 2009a

    3/30

    History of Digital System Design

    VLSI taught

    In CMOS

    2000

    FPGAs

    In classes

    Mead &

    Conway

    First

    Synthesis

    classes

    1970

    1980 1990

    1960

    4004 Intel

    First IC

    First CAD

    (Fairchild, 1967)

    TMS 32010(NMOS)

    Magic (CAD ventures) Synthesis tools

    First VLSI

    courses

    Speak and Spell

    (first DSP?)

    TI C54

    (fixed

    point)

    MOSIS

    XC2064

    Pentium (Intel)

    (0.8um)

    MIPS

    Handcrafted Design

    Every Gate Optimized

    Cost only feasible for

    government contracts

    A separation of design from technology

    (build framework for abstraction)

    technologists know, how to fabricate

    smaller and faster transistors

    designers know how to coordinate

    millions of transistors

  • 8/10/2019 Hasler IIT Lecture2 2009a

    4/30

    Reconfigurable Signal Processing

    Innovation and Process Scaling moves

    solutions towards programmability

    and reconfigurability

    Cos

    t

    Cos

    t

    100% S/W

    (Programmable)

    100% H/W

    (Fixed Function)

    Tech

    trend

    Obtaining data for 4MMAC computation ~ 4mW

    DSPs Low Power Processing

    - cell phones(processing < 30mW average)

    - hearing aids (1 mW levels)

    (AMI / DSP factory)

    Power: 54C series 4MMAC/mW

    FPGAs Large Configurability

    Power: Just MAC engine

    around 2-10MMAC/mW

    Baseline static power ~ 0.5W to 1 W

    Signal routing power / memory: ?

    Power does not include comm off chip

    (i.e. accessing memory)

    Power = !C Vdd2f for CMOS

    Chip to Chip (10pF load min, 2.5V):

    32uW/Mbit (dynamic)

  • 8/10/2019 Hasler IIT Lecture2 2009a

    5/30

    Modern System DesignDesign at

    Multipliers and Adders

    When building analog systems,

    we expect to build primitives at the basic algorithm level....

    Analog = programmable and configurable.

    How to get enough analog engineers

    Design at

    gate level Design at Basic Algorithms

    Vector-Matrix Multiplication

    Frequency Decomposition

    Adaptive FiltersClassifiers (NN, GMM, HMM)

    Hierarchy is a key ingredient to the

    success of the digital circuit, and, until

    recently, one reason why large analogdesigns have been difficult

    (1837)

    Fixed function Digital

    Fixed function Analog

    Programmable

    Digital (Mixed mode)

  • 8/10/2019 Hasler IIT Lecture2 2009a

    6/30

    Levels of Energy Efficiency

    Subthreshold

    Transistor Operation Programmable Circuits

    (FG transistors) Analog Signal Processing Configurable Signal

    ProcessingHighest throughput /

    amount of power

    Eliminate mismatch

    Programmability

    Wide accessibility

    ~ x1000 improvement

    in power efficiency

    Moving analog approaches /conceptual framework to a system design approach,

    similar to digitals system transformation in the 1970s / 80s.

  • 8/10/2019 Hasler IIT Lecture2 2009a

    7/30

    Measured Channel Current

  • 8/10/2019 Hasler IIT Lecture2 2009a

    8/30

    MOSFET Current-Voltage Curves

    If,r= 2 Ithln2(1 + e )

    !!!"#$"%' ( ")*+( ""+*)',-.%

    Ith= CoxUT2(W/L) / !

    EKV Model

    Subthreshold

    DIBL / VA

    Above-threshold

    I = 2 Ith(e - e )!!

    !"#$"%' $ ")$"

    "+',.% !!

    !"#$"%' $ "+$"

    ")',.%

    If,= ( Cox/!) (W/L) / ( )2!!!"#$"%' $ ")( ""+'

    If= 2 Ith e!!!"#$"%' $ ")$ ""+',.% (Saturation, Vds> 4UT)

    (Saturation, IR~0, Vds> Von)

    I = #f

    #r

  • 8/10/2019 Hasler IIT Lecture2 2009a

    9/30

    Classic Multilevel EEPROMs

    Vtun

    Tunneling

    Junction

    First reported EEPROM element in standard CMOS

    (Thomson and Brooke, 1989)

    ETANN: Floating-Gate element

    used for biasing (Holler, et.al, 1989)

    EEPROM Process, bidirectional tunneling

    GND GND

    V2V1

    ISD voice recorder ICs

    (answering machine messages, greeting cards, etc.)

    Many standard IC processes allow for

    EEPROM devices (standard cells, standard process)

    Most commercial EEPROMs are multibit

  • 8/10/2019 Hasler IIT Lecture2 2009a

    10/30

    Programmable Analog Transistors

    Otherwise, need a DAC at every parameter and/or memory, etc.

    "#$%&'%(' )*+#

    ",%$% (-$-&./&0

    1 23 456278 495 :-%(;

  • 8/10/2019 Hasler IIT Lecture2 2009a

    11/30

    Electron Transport in a subthreshold nFET

  • 8/10/2019 Hasler IIT Lecture2 2009a

    12/30

    Measurements and Modeling of

    Hot-Electron Injection

  • 8/10/2019 Hasler IIT Lecture2 2009a

    13/30

    Impact Ionization

    UO- 7-%& (%$- /V %& B7?%D$MB/&BW%./&

    D/CCB@B/& B@ OBGOC: -&-(G: '-?-&'%&$

    X7?%D$ )H((-&$ B@ ?(/?/(./&%C

    $/ @/H(D- DH((-&$

  • 8/10/2019 Hasler IIT Lecture2 2009a

    14/30

    pFET Hot-Electron Injection

    UO- B&Q-D$-' -C-D$(/&@ %(- G-&-(%$-'

    K: O/C- B7?%D$ B/&BW%./&@6

    X&Q-D./& DH((-&$ B@ ?(/?/(./&%C $/

    @/H(D- DH((-&$; %&' B@ %&

    -Y?/&-&.%C VH&D./& /V !'D

    6

    3B&QZ [

  • 8/10/2019 Hasler IIT Lecture2 2009a

    15/30

    Injection Above and Below VTv

    source

    drai

    channel

    1

    2

    34

    pFET injection, Above VT, Ohmic

    pFET injection, Sub VT, Saturation

  • 8/10/2019 Hasler IIT Lecture2 2009a

    16/30

    Floating-Gate Devices as Circuit elements

    Analog Signal processing at EEPROM densities

    NIPS 1994

    Vdd

    Vtun

    Vd

    Vg

    Neuron MOS ("MOS)(Shibata and Ohmi, 1992)

    GND

    Vdda3

    a2

    a1

    a0

    8C

    4C

    2C

    C

    Vout

    GND

    Vdd

    a3

    a2

    a1

    a0

    8C

    4C

    2C

    C

    Vout

    4-bit DAC (no sampling)

    GND

    Gate1

    Gate2 GND

    Iout

    Gate1

    Gate2

    Iout

  • 8/10/2019 Hasler IIT Lecture2 2009a

    17/30

    In+

    In-

    Itail

    S1 S2D1 D2

    Vg Vg

    Vtun VtunVdd Vdd

    VoutBias

    CircuitryM1 M2

    M3 M4

    M5 M6

    Floating-gate transistors

    M8

    M7

    M10

    M9

    VAVB

    Input Offset Voltage Drifts

    by 130V over 170C

    Measured Offset Voltage Drift vs. Temperature

    Input Offset

    Voltage

    Reduced to

    25V

    Prog. Analog ICs Industrial Respect

    V. Srinivasan, G. Serrano,

    J. Gray, and P. Hasler,

    CICC 2005, pp. 739-742.

    (Best paper CICC 2005)

    Gm-C filters, C4 Filters, ADCs, DACs, V regulators

  • 8/10/2019 Hasler IIT Lecture2 2009a

    18/30

    Floating-Gate Voltage Output DAC

    Process/ Vdd 0.5um CMOS / 5V

    Linearity 10bit (INL/DNL)

    Epot Accuracy < 100uV (measured)

    < 1uV (theoretical)

    Sample Rate ~10MSPS(instrumented)

    >100MSPS (on-chip)

    Input caps 140fF

  • 8/10/2019 Hasler IIT Lecture2 2009a

    19/30

  • 8/10/2019 Hasler IIT Lecture2 2009a

    20/30

    Analog--Digital Signal ProcessingCADSP = Cooperative AnalogDigital

    Signal Processing

    Digital and Analog SP Efficiency

    Custom Analog ~ 1000 - 10000 more

    efficient than Custom Digital (Mead 1990)

    Analog (VMM): 10MMAC/ W

    ( = 10TMAC / W)

    Digital: 4 MMAC / mW (DSP)

    A/DConverter

    Real

    world(analog)

    DSPProcessor

    Computer(digital)

    Real

    world

    (analog)

    DSP

    Processor

    Computer

    (digital)

    ASP

    ICA/D

    Specialized A/D

    Computation MMAC/W Ratio to digital

    LowPowerDSPs 0.02 to 0.002 1

    Analog VMM 1 to 30 1000

    Analog Filterbanks 30 to 1000 10000

    Analog VQ 1 to 10 300

    Analog HMM >1000 > 100000

    Cepstrum

    VQ

    HMM

    Microphon

    e

    DigitalSignal

    Processing

  • 8/10/2019 Hasler IIT Lecture2 2009a

    21/30

  • 8/10/2019 Hasler IIT Lecture2 2009a

    22/30

    FPAAs are Gaining Momentum

    Concept Simulation VLSI Fabrication Testing

    (3 months)

    x 3

    ConceptSimulation/

    Synthesis Testing VLSI Fabrication

    x 20

    Large-Scale Field

    Programmable Analog

    Arrays (FPAA)

    Approach Built on FloatingGate Circuits

    RASP 1.x (2002)(T. Hall, P. Hasler, et. al, FPL, Sept. 2002. )

    RASP 2.x:

    RASP 2.5, 2.7: 2004-2007- >50,000 Prog. Analog Devices

    - Used by > 100 Eng

    RASP 2.8x: 2008-

    - Used by > 100 Eng

    RASP 2.9x: 2009-

    Jan 2008

    Can be a prototyping tool,

    early devices,

    or final application

  • 8/10/2019 Hasler IIT Lecture2 2009a

    23/30

    RASP Programming/ConfigurationgV

    Vd

    ProgramRun (Program)

    GND

    Vout

    GND

    GND

    GND

    GND

    GND

    GND

    Vin

    Vdd GND

    GND

    GND

    Vdd GND

    GND

    C

    Vin

    Vout

    GND

    A B C D E F G H

    VMM VMM VMM VMM VMM VMM VMM VMM

    0 0 0 0 0 0 0 0

    252 216 180 144 108 72 36 0

    GP GP GP GP GP GP GP GP

    56 56 56 56 56 56 56 56

    252 216 180 144 108 72 36 0

    GP GP GP GP GP GP GP GP98 98 98 98 98 98 98 98

    252 216 180 144 108 72 36 0

    GP GP GP GP GP GP GP GP

    140 140 140 140 140 140 140 140

    252 216 180 144 108 72 36 0

    GP GP GP GP GP GP GP GP

    182 182 182 182 182 182 182 182

    252 216 180 144 108 72 36 0

    GP GP GP GP GP GP GP GP

    224 224 224 224 224 224 224 224

    252 216 180 144 108 72 36 0

    VMM VMM VMM VMM VMM VMM VMM VMM

    266 266 266 266 266 266 266 266

    252 216 180 144 108 72 36 0

    5

    6

    7

    1

    2

    3

    4

  • 8/10/2019 Hasler IIT Lecture2 2009a

    24/30

    RASP 2.8 / 2.9 Series of FPAA devices

    0.35um CMOS

    Size ~ 3mm x 3mm

    I/O pins ~ 56 (100 pin package)

    2.8a: General FPAA

    2.8b: BioChannel FPAA 2.8c: Sensor FPAA 2.8d: MITE FPAA

    a low-power FPGA

    \@-' K: R955 ]&G6

    Switches are not dead weight

    On-chip Programming 120 dB DR TIA

    9 bit ramp ADC 7 bit DAC

    RASP 2.9 IC family

    Family of nine FPAA ICs

    Generic FPAA Block

    FPAA with Channel CABs

    FPAA with Channel CABs+ Adaptive Synapses

    FPAAs with Adaptive blocks

    Larger Devices: 5mm x 5mm (x3)

    100CABs;

    potentially 1TMAC from one chipBetter Reticle Design: more # of devices

    Custom versus FPGAs:

    x2-3 speed, x10 area, x100 power

    Custom versus FPAAs:

    < x2 speed, < x2 area, < x2 power

    RASP 2.8 IC family

  • 8/10/2019 Hasler IIT Lecture2 2009a

    25/30

    Looking Closer at

    CAB Components

    nFET Transistors

    pFET Floating-Gate Transistors

    Transmission Gate

    Floating Capacitors (2 terminals)

    Basic 9-Transistor OTA

    FG input 9-Transistor OTA

    FG input 9-Transistor, Buffer Connected OTA

  • 8/10/2019 Hasler IIT Lecture2 2009a

    26/30

    Other RASP 2.8 Architectures

    RASP 2.8 architecture with transistor channel /synapses as CAB elements

    RASP 2.8c: Sensor enabled FPAA

    RASP 2.8d : MITE Enabled FPAA

    RASP 2.8 architecture with MITE CAB

    design and current mode support circuitry

    RASP 2.8 architecture with additional CABsfor Universal Sensor Circuits

    RASP 2.8b: Bio enabled FPAA

    Inspired from FPNA work [Farquhar, et. al, 2006)

  • 8/10/2019 Hasler IIT Lecture2 2009a

    27/30

  • 8/10/2019 Hasler IIT Lecture2 2009a

    28/30

    Building Bridges between

    Algorithms and Hardware

    Building Infrastructure:

    Testing / Demonstration Boards

    & teaching how to build

    - Wide use of FPAA test platform

    - Smaller Board development /

    dedicated Programming boards

    - FPAA chip specific adaptor boards(single and multiple chip platforms)

    Software Infrastructure / Tools

    First automated simulink to

    system measurement test, Dec 2008

    Some next directions

    Targeting to SPICESimulink design tools

    - More simulation models

    - Noise, SNR, Distortion

    Developed

    visual tool for

    routing (RAT)

    Extensive Library

    (working circuits)

    Parameter

    Translation

    Starting design

    at high level

  • 8/10/2019 Hasler IIT Lecture2 2009a

    29/30

    Rapid Prototyping using FPAAsRASP 2.7 PhotoReceptor Response

    1 2 3

    Paper Strip

    1

    2

    3

  • 8/10/2019 Hasler IIT Lecture2 2009a

    30/30

    Levels of Energy Efficiency

    Subthreshold

    Transistor Operation Programmable Circuits

    (FG transistors) Analog Signal Processing Configurable Signal

    ProcessingHighest throughput /

    amount of power

    Eliminate mismatch

    Programmability

    Wide accessibility

    ~ x1000 improvement

    in power efficiency

    These techniques open further opportunities to utilize / explore

    biologically inspired techniques

    Large need for tools to compile / program these systems.

    Link most useful at system /sig processing level

    Education / training / foundational theory is critical for designing.

    Moving analog approaches /conceptual framework to a system design approach,

    similar to digitals system transformation in the 1970s / 80s.