pipelining & retiming of digital circuits

Upload: gautham-popuri

Post on 03-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    1/47

    Pipelining and Retiming 1

    Pipelining

    Adding registers along a path

    split combinational logic into multiple cycles

    increase clock rate increase throughput

    increase latency

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    2/47

    Pipelining and Retiming 2

    Pipelining

    Delay, d, of slowest combinational stage determines performance

    clock period = d

    Throughput = 1/d : rate at which outputs are produced

    Latency = nd : number of stages * clock period

    Pipelining increases circuit utilization

    Registers slow down data, synchronize data paths

    Wave-pipelining

    no pipeline registers - waves of data flow through circuit

    relies on equal-delay circuit paths - no short paths

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    3/47

    Pipelining and Retiming 3

    When and How to Pipeline?

    Where is the best place to add registers? splitting combinational logic

    overhead of registers (propagation delay and setup timerequirements)

    What about cycles in data path?

    Example: 16-bit adder, add 8-bits in each of two cycles

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    4/47

    Pipelining and Retiming 4

    Retiming

    Process of optimally distributing registers throughout a circuit

    minimize the clock period

    minimize the number of registers

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    5/47

    Pipelining and Retiming 5

    Retiming (contd)

    Fast optimal algorithm (Leiserson & Saxe 1983)

    Retiming rules:

    remove one register from each input and add one to each output

    remove one register from each output and add one to each input

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    6/47

    Pipelining and Retiming 6

    Optimal Pipelining

    Add registers - use retiming to find optimal location

    871310

    56

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    7/47

    Pipelining and Retiming 7

    Optimal Pipelining

    Add registers - use retiming to find optimal location

    871310

    56

    871310

    56

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    8/47

    Pipelining and Retiming 8

    Example - Digital Correlator

    yt= d(xt, a0) + d(xt-1, a1) + d(xt-2, a2) + d(xt-3, a3)

    d(xt, a0) = 0 if x a, 1 otherwise (and passes x along to the right)

    ++

    dd

    +

    d d

    host

    yt

    xt a0 a1 a2 a3

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    9/47

    Pipelining and Retiming 9

    Example - Digital Correlator (contd)

    Delays: adder, 7; comparator, 3; host, 0

    ++

    dd

    +

    d d

    host

    cycle time = 24

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    10/47

    Pipelining and Retiming 10

    Example - Digital Correlator (contd)

    Delays: adder, 7; comparator, 3; host, 0

    ++

    dd

    +

    d d

    host

    ++

    dd

    +

    d d

    host

    cycle time = 24

    cycle time = 13

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    11/47

    Pipelining and Retiming 11

    Retiming: One Step at a Time

    77

    33

    7

    3 3

    0

    0 00

    11

    1 1

    0

    77

    33

    7

    3 3

    0

    0 00

    11

    0 2

    0

    77

    33

    7

    3 3

    0

    0 00

    11

    0 1

    1

    0 00

    0 10

    0 10

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    12/47

    Pipelining and Retiming 12

    Retiming: One Step at a Time (contd)

    77

    33

    7

    3 3

    0

    0 10

    11

    0 1

    00 00

    77

    33

    7

    3 3

    0

    0 10

    20

    0 1

    00 01

    77

    33

    7

    3 3

    0

    1 10

    10

    0 1

    00 01

    and after a few more . . .

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    13/47

    Pipelining and Retiming 13

    Retiming Algorithm

    Representation of circuit as directed graph

    nodes: combinational logic

    edges: connections between logic that may or may not includeregisters

    weights: propagation delay for nodes, number of registers for edges

    path delay (D): sum of propagation dealys along path nodes

    path weight (W): sum of edge weights along path

    always > 0, no asynchronous feedback

    Problem statement

    given: cycle time, T, and a circuit graph

    adjust edge weights (number of registers) so that all path delays Tclk,

    wnew(u, v) = wold(u, v) + r(u) - r(v) 1 r(u) - r(v) - wold(u, v) + 1 (every long path has at least 1 reg)

    Difference constraints like this can be solved by generating a graphthat represents the constraints and using a shortest path algorithm likeBellman-Ford to find a set of r(v) values that meets all the constraints

    The value of r(v) returned by the algorithm can be used to generate thenew positions of the registers in the retimed circuit

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    18/47

    Pipelining and Retiming 18

    Retimed Correlator

    77

    33

    7

    3 3

    0

    0 00

    11

    1 1

    00 00

    77

    33

    7

    3 3

    0

    1 10

    10

    0 1

    00 01

    r = 2

    r = 2

    r = 2

    r = 1

    r = 1r = 1

    r = 0

    r = 0

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    19/47

    Pipelining and Retiming 19

    Extensions to Retiming

    Host interface

    add latency

    multiple hosts

    Area considerations

    limit number of registers

    optimize logic across register boundaries

    peripheral retiming

    incremental retiming

    pre-computation

    Generality different propagation delays for different signals

    widths of interconnections

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    20/47

    Pipelining and Retiming 20

    a

    b

    c

    dxD Q

    a

    bd

    x

    D Q

    D Q

    a

    bx

    c

    D Q

    D Q

    D Q

    x

    c

    a

    b

    D Q

    D Q

    Retiming examples

    Shortening critical paths

    Create simplification opportunities

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    21/47

    Pipelining and Retiming 21

    Digital Correlator Revisited

    Optimally retimed circuit (clock cycle 13)

    How do we know this is optimal?

    Max-Ratio Theorem: TcDcycle/Rcycle for all cycles in circuit

    Dcycle= total delay on cycle, including register tpd, tsu

    Rcycle= number of registers on cycle

    We know we can never do better than this

    Cant always do this well

    ++

    dd

    +

    d d

    host

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    22/47

    Pipelining and Retiming 22

    Going Faster: C-slowing a Circuit

    Replace every register with C registers

    Now retime: (clock cycle now 7)

    ++

    dd

    +

    d d

    host

    ++

    dd

    +

    d d

    host

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    23/47

    Pipelining and Retiming 23

    C-slowing a Circuit

    Note that we get one value every c clock cycles But clock period decreases Throughput remains the same at best

    The trick: Interleave data sets

    Example: Stereo audio Interleave the data for the two channels Doubles the throughput!

    ++

    dd

    +

    d d

    host

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    24/47

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    25/47

    Pipelining and Retiming 25

    Using C-Slowing For Time-Multiplexing

    Pipelined/Retimed Circuit

    Lets reschedule for 2 clock cycles/iteration

    x

    +

    +

    x

    x

    x

    +

    +

    x x

    mult: 10, add: 5, Tpd: 2, Tsu: 3, Th: 1

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    26/47

    Pipelining and Retiming 26

    Using C-Slowing For Time-Multiplexing

    Start by C-slowing

    x

    +

    +

    x

    x

    x

    +

    +

    x x

    mult: 10, add: 5, Tpd: 2, Tsu: 3, Th: 1

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    27/47

    Pipelining and Retiming 27

    Using C-Slowing For Time-Multiplexing

    Now retime

    Note: 3 multiplers are red, 3 are white: share

    2 adders are red, 2 are white: share

    x

    +

    +

    x

    x

    x

    +

    +

    x x

    mult: 10, add: 5, Tpd: 2, Tsu: 3, Th: 1

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    28/47

    Pipelining and Retiming 28

    Using C-Slowing For Time-Multiplexing

    Result Cost: 1/2

    clock period: 25 -> 15

    Throughput: 1/25 -> 1/30

    x

    +

    +

    x

    x

    x

    +

    +

    x x

    mult: 10, add: 5, Tpd: 2, Tsu: 3, Th: 1

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    29/47

    Pipelining and Retiming 29

    *

    +

    *

    +

    *

    +

    *

    +0

    C-slowing/Retiming for Resource Sharing

    FIR Filter

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    30/47

    Pipelining and Retiming 30

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    31/47

    *

    +

    *

    +

    *

    +

    *

    +

    C-slowed by 4

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    32/47

    *

    +

    *

    +

    *

    +

    *

    +

    Insert Data every 4 cycles (one data set)

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    33/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    34/47

    *

    +

    *

    +

    *

    +

    *

    +

    Computation Active only every 4 Cycles

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    35/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    36/47

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    37/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    38/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    39/47

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    40/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    41/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    42/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    43/47

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    44/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    45/47

    *

    +

    *

    +

    *

    +

    *

    +

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    46/47

    *

    +

    *

    +

    *

    +

    *

    +

    C t ti d ti

  • 8/13/2019 Pipelining & Retiming of Digital Circuits

    47/47

    *

    +

    *

    +

    *

    +

    *

    +

    Computation spread over time

    Only need one multiplier and one adder

    We can use this method to schedule for any number of resources