fpga floorplanning

Upload: vijaymails

Post on 07-Apr-2018

235 views

Category:

Documents


2 download

TRANSCRIPT

  • 8/6/2019 FPGA Floorplanning

    1/14

    Introduction to Floorplanning

    Why do Floorplanning?

    Floorplanning is the process of identifying structures that should be placed close

    together, and allocating space for them in such a manner as to meet the sometimes

    conflicting goals of available space (cost of the chip), required performance, and the

    desire to have everything close to everything else.

    Within the Xilinx chips it is often the case that the smallest area design is also the highest

    performance design. This flies in the face of many design methodologies, where area and

    speed are considered to be things that should be traded off against each other.

    The reason this is so is probably because there are limited routing resources, and the

    more routing resources that are used, the slower the design will operate. Optimizing for

    minimum area allows the design to use fewer resources, but also allows the sections of

    the design to be closer together. This leads to shorter interconnect distances, less routing

    resources to be used, faster end-to-end signal paths, and even faster and more

    consistent place and route times. Done correctly, there are no negatives to

    Floorplanning.

    What negatives could there be? Well, if the Floorplanning is done with no regard for the

    architecture of the chip, then it is possible to actually do a worse job than the Xilinx placer

    section of the place and route software. It is also possible that there are constraints that

    are not well understood until placement is complete, and routing commences. So the

    issue then is what constitutes the "Done correctly".

    As a general rule, data-path sections benefit most from Floorplanning, and random logic,

    state machines, and other non-structured logic can safely be left to the placer section of

    the place and route software.

    Data paths are typically the areas of your design where multiple bits are processed in

    parallel with each bit being modified the same way with maybe some influence from

    adjacent bits. Example structures that make up data paths are Adders, Subtractors,

    Counters, Registers, and Muxes.

    How to Floorplan a design

    Although there are no hard and fast rules to Floorplanning, this section outlines the basic

    structure for a Floorplanned design, and highlights the issues you need to consider when

    Floorplanning a design. As described above, Floorplanning has its greatest return whenapplied to data path elements. The Xilinx XC4000 devices, and all of the derivative

    families (the A, D, E, EX, H, L, XL, Spartan, and SpartanXL families) all have the

    following basic structure:

    o A rectangular array of Configurable Logic Blocks (CLBs). These

    logic blocks contain two main function generators, and two flip-flops. The

    function generators can represent any number of gates that as a group

    has no more than 4 inputs, one output, and no internal loops (that would

  • 8/6/2019 FPGA Floorplanning

    2/14

    implement latch like behavior). The flip-flops are either rising or falling

    edge triggered, include a clock-enable function that is implemented with

    a re-circulation multiplexer from the Q output to the D input, and can

    have either an active high asynchronous reset or set function.

    Associated with each CLB are two tri-stateable buffers.

    o Segmented interconnect including short interconnect for local

    signals, and long-lines for spanning the width or height of the chip. Inmany of the devices, the horizontal long-lines can be split into a left and

    a right half, allowing up to twice as many lines, that span half the width of

    the chip.

    o The two tri-stateable buffers associated with each CLB are pre-

    connected to two of the horizontal long-lines.

    o Input and Output pins on all 4 sides of the array.

    o Pre-built Carry logic that is pre-connected vertically in column of

    CLBs.

    To support these characteristics, consistently implement all data path elements with a bit

    pitch of two bits per row, and data path elements are always vertical structures, of one or

    more columns.

    The Xilinx FPGAs are biased to have data flow along horizontal interconnect, and to have

    arithmetic functions operate in vertical columns. The bias comes from the horizontal long

    lines with tri-stateable buffers, and the vertical pre-built and routed carry logic.

    The carry logic is also used to build fast counters, so although you may not initially think

    of a counter as an arithmetic function, it falls into the same pattern as adders,

    subtractors, and arithmetic comparisons, because of its use of the carry chain. This view

    can be clarified by thinking of a counter as an incrementor, followed by a holding register.

    The bit pitch of two bits per row is driven primarily by the structure of the carry logic, but

    is also the bit pitch that the tri-stateable buffers implement. What this means is that thenatural structure of arithmetic functions in these devices implements 2 bits of a function

    (a two bit slice) in one row of CLBs, and for simple functions, in one column. A simple

    function such as a ten bit synchronous up-counter will therefore take 5 rows and 1

    column, a total of 5 CLBs.

    Although the XC4000 devices and the A, D, E, H, and L derivatives allow the carry signal

    between CLBs to interconnect in both an up and down direction within a column, the

    more recent XC4000EX, XC4000XL, Spartan and SpartanXL devices only support the

    carry signals being routed up a column. For all devices, within a CLB, the carry routing is

    up, with regard to the two function generators. It is expected that this up only bias will

    exist in future products from Xilinx. To be compatible with all these products, you should

    onlyuses the up direction for carry, and this bias then affects allother functions that aregenerated. For the example 10 bit counter described in the previous paragraph, the

    Floorplan will have bit 0 and 1 in the CLB at the bottom of the column of 5 CLBs, and the

    top CLB will have bits 8 and 9.

  • 8/6/2019 FPGA Floorplanning

    3/14

    Following Xilinx's standard, the two main function

    generators are shown on the left of diagrams, and are

    labeled F and G, and the two flip-flops are shown on the

    right and are labeled X and Y.

    For the example counter, in the CLB at the bottom of thefive CLB group (the one with the RLOC=R4C0 attribute),

    the F function generator will be used to implement the

    logic that feeds the D pin of the X flip-flop, the output of

    which, is the least significant bit of the counter, Q0.

    The G and Y sections of the same CLB implement bit 1 ofthe counter. The next CLB above (the one with the

    RLOC=R3C0 attribute) implements bit 2 and 3. This

    continues up the column, through to the top CLB which

    implements bits 8 and 9.

    When two or more functions of your design are Floorplanned in this way and placed side

    by side, with the signals that flow from one function to the next aligned on the same row,

    and in near or adjacent columns, the design will place and route much faster and the

    resulting design will perform faster than a design without Floorplanning, and that relies on

    the Xilinx place and route software to decide on placement. Of course, custom building

    each function section of your design with detailed Floorplanning for each function

    generator and flip-flop can be a complex, time consuming, and potentially error prone

    process.

    The Xilinx Place and Route software uses a hierarchical placement constraint system

    called relative location attributes. Each level of the hierarchy has an origin in the top left

    corner that has a relative location of row zero and column zero. As a constraint this isrepresented as R0C0. Rows are numbered from top to bottom, and columns are

    numbered from left to right. When a relative location attribute (RLOC) is assigned to a

    part of the hierarchy that is not a single CLB, then the underlying RLOCs are added to

    the attached attribute to calculate the RLOC value for each of the underlying RLOCs.

    This process continues throughout the hierarchy, resolving each CLB RLOC to a value

    that is relative to the RLOC at the top of the hierarchy. This process, and other issues

    related to how RLOCs are processed are discussed in full in the Xilinx "Libraries Guide"

    document, in the "Attributes, Constraints, and Carry Logic" chapter, in the "Relative

    Location (RLOC) Constraints" section. Although this section of Xilinx's documentation is

    quite complex, it is recommended that you review it to better understand how the RLOCs

    in the modules support Floorplanning.

    http://www.fliptronics.com/images/fgxy.gif
  • 8/6/2019 FPGA Floorplanning

    4/14

    An Example design, with various levels of Floorplanning

    This section examines the results of Floorplanning, and compares the resulting structure,

    the place and route time, and the design performance. The example while contrived istypical of the types of logic that benefit from Floorplanning. The example design

    comprises four sixteen bit binary up counters, that all feed into a selection multiplexer.

    The output of the selection multiplexer is registered, and the output of this register is

    connected to the FPGA pins.

    There are two basic timing path categories that need to be analyzed. The first is the

    maximum delay in any of the counters. And the second is the maximum delay from any of

    the counters to the multiplexer output register. For the counter, the maximum delay will

    be from the clock to out time of the LSB flip-flop, through the logic that establishes the

    next counter value, to the D input of the MSB flip-flop, and meeting its setup time. The

    reciprocal of this maximum internal delay within the counter is the maximum clock rate at

    which the counter will count reliably.

    Seven different levels of Floorplanning are applied to this simple design, using the

    XC4005E, XC4010E, and XC4010XL as targets. The '-2' speed grade is used for all

    examples, and place and route programs used are as follows:

    1. XC4005E-2 PPR V5.2.1

    2. XC4010E-2 PPR V5.2.1

    3. XC4010E-2 PAR M1.4

    4. XC4010XL-2 PAR M1.4

    The combination of running the XC4010E devices with both place and route programs

    allows comparison of these programs on the XC4000E families. Running both theXC4010E and XC4010XL on the M1.4 program, allows comparison of these two product

    families. While the goal is to show the value of Floorplanning, the program and product

    comparisons are interesting.

    The same seven levels of Floorplanning were applied to each of these four

    product/program combinations. The seven design styles have the following

    characteristics:

    1. The 4 counters are binary ripple counters (CB16CE), from the

    Xilinx unified library XC4000E, the multiplexer and output register are

    also taken from this library. There is no Floorplanning in this style, and

    the choice of a ripple counter, while available in the library, is a poorchoice.

    2. The 4 counters are binary counters that use the built-in carry

    logic (CC16CE), from the Xilinx unified library XC4000E, the multiplexer

    and output register are also taken from this library. While there is no

    explicit Floorplanning in this style, the counters include internal

    Floorplanning, because the carry logic imposes a column structure on

    the counters.

  • 8/6/2019 FPGA Floorplanning

    5/14

  • 8/6/2019 FPGA Floorplanning

    6/14

    flops with the multiplexers. A four-to-one multiplexer requires all the gate resources of a

    CLB, so to build a 16 bit wide multiplexer with four inputs will require 16 CLBs. Strictly

    maintain a Floorplanning structure of two bits of data path implemented per row of

    structure. The 16 CLBs are Floorplanned to use two columns by eight rows, with bits 0

    and 1 on the row at the bottom, and bits 14 and 15 at the top. This exactly matches the

    bit position of the counters, except the counters have an additional block at the top, for

    the TC and CEO outputs. This is resolved by placing the counters with RLOC-ORIGINSon row 1, but the multiplexer is placed on row 2.

    At this point you may wonder what additional improvement could be made to style 6.

    Consider the routing from the left most counter to the multiplexer. It must pass through

    the other three counters to get to the multiplexer. Similarly, the output of counters two

    and three must also pass through the fourth counter to get to the multiplexer. Therefore,

    there is more routing congestion around counter four, although it has the shortest path to

    the multiplexer. The output of the first counter must traverse the furthest distance to get

    to the multiplexer. In synchronous designs like this, the slowest path out of a group of

    paths will be the limiting factor. For the counters to run at their fastest, they need to have

    their routing congestion minimized. For the paths from the four counters to the multiplexer

    to be minimized, the multiplexer and the four counters need to be placed so as tominimize the worst-case distance. Both of these goals are achieved in style 7 by placing

    the multiplexer and its output register in the middle of the structure, with two counters to

    its left, and two counters to its right.

    As can be seen from the following tables and diagrams, style 7 delivers the fastest

    counters, the fastest counter to multiplexer output register time, the fastest placement

    time, and the fastest routing time. Studying the schematics for design styles 1 and style 7

    shows almost no additional effort to create design 7's result. Selecting counters and

    multiplexers that are pre-Floorplanned, together with five placement attributes is all that is

    required. (Some thought as to what the placement constraints should be, obviously is

    also needed)

    XC4005EPC84-2 Processed with PPR V5.2.1c

    Design

    Style

    Counter

    Delay

    (nS)

    Max

    Frequency

    (MHz)

    Counter to

    MUX REG

    delay (nS)

    Partition +

    Placement

    time (S)

    Routing

    Time

    (Seconds)

    CLBs

    Used

    1 17.1 58.4 11.8 4+28 12 72

    2 13.1 76.3 10.8 6+15 13 48

    3 13.4 74.6 11.7 6+14 17 48

    4 13.1 76.3 14.4 7+12 17 48

    5 14.3 69.9 14.5 6+12 16 48

  • 8/6/2019 FPGA Floorplanning

    7/14

    6 13.3 75.1 9.4 3+11 16 48

    7 13.1 76.3 8.9 3+11 14 48

    XC4010EPC84-2 Processed with PPR V5.2.1c

    Design

    Style

    Counter

    Delay

    (nS)

    Max

    Frequency

    (MHz)

    Counter to

    MUX REG

    delay (nS)

    Partition +

    Placement

    time (S)

    Routing

    Time

    (Seconds)

    CLBs

    Used

    1 17.5 57.1 12.9 7+53 32 88

    2 13.3 75.1 11.2 4+13 12 48

    3 13.5 74.0 12.6 4+11 15 48

    4 13.1 76.3 14.6 4+11 17 48

    5 13.2 75.7 14.2 3+11 14 48

    6 13.3 75.1 10.2 2+10 16 48

    7 13.1 76.3 8.9 1+10 15 48

    XC4010EPC84-2 Processed with M1.3.7 (PAR L4 D5) (A)

    Design

    Style

    Counter

    Delay

    (nS)

    Max

    Frequency

    (MHz)

    Counter to

    MUX REG

    delay (nS)

    Placement

    time

    (Seconds)

    Routing

    Time

    (Seconds)

    CLBs

    Used

    1 21.9 45.6 19.4 65-7=58 574-65=509 55

    2 13.7 72.9 10.0 47-7=40 142-47=95 48

    3 13.8 72.4 10.3 38-8=30 170-38=132 48

    4 13.8 72.4 12.7 28-8=20 132-28=104 56

  • 8/6/2019 FPGA Floorplanning

    8/14

    5 13.7 72.9 13.1 28-8=20 128-28=100 56

    6 13.7 72.9 9.4 15-8=7 80-15=65 48

    7 13.7 72.9 8.9 14-8=6 75-14=61 48

    XC4010XLPC84-2 Processed with M1.3.7 (PAR L4 D5) (B)

    Design

    Style

    Counter

    Delay

    (nS)

    Max

    Frequency

    (MHz)

    Counter to

    MUX REG

    delay (nS)

    Placement

    time

    (Seconds)

    Routing

    Time

    (Seconds)

    CLBs

    Used

    1 18.5 54.0 8.8 68-20=48 147-68=79 55

    2 11.6 86.2 7.0 53-21=32 134-53=81 48

    3 11.9 84.0 6.9 46-21=25 128-46=82 48

    4 12.1 82.6 10.6 34-22=12 95-34=61 56

    5 11.7 85.4 10.7 33-21=12 91-33=58 56

    6 11.9 84.0 6.8 25-20=5 64-25=39 48

    7 11.7 85.4 6.1 26-21=5 69-26=43 48

    XC4010XLPC84-2 Processed with M1.4.12 (MAP K, PAR L4 D5)

    Design

    Style

    Counter

    Delay

    (nS)

    Max

    Frequency

    (MHz)

    Counter to

    MUX REG

    delay (nS)

    Placement

    time

    (Seconds)

    Routing

    Time

    (Seconds)

    CLBs

    Used

    1 18.2 54.9 11.3 64-20=44 185-64=121 83

    2 11.3 88.5 9.8 39-21=18 183-39=144 72

    3 11.8 84.7 10.6 33-20=13 108-33=75 72

  • 8/6/2019 FPGA Floorplanning

    9/14

    4 11.6 86.2 10.8 32-21=11 128-32=96 72

    5 11.7 85.4 11.0 32-21=11 116-32=84 72

    6 11.6 86.2 6.8 24-21=3 59-24=35 48

    7 11.7 85.4 6.1 24-20=4 61-24=37 48

    XC4010XLPC84-2 Processed with M1.4.12 (MAP K, PAR L5 D5)

    Design

    Style

    Counter

    Delay

    (nS)

    Max

    Frequency

    (MHz)

    Counter to

    MUX REG

    delay (nS)

    Placement

    time

    (Seconds)

    Routing

    Time

    (Seconds)

    CLBs

    Used

    1 17.3 57.8 11.3 99-20=79 224-99=125 83

    2 11.7 85.4 9.9 58-21=37 229-58=171 72

    3 12.1 82.6 10.5 46-20=26 140-46=94 72

    4 11.6 86.2 11.1 44-21=23 117-44=73 72

    5 11.7 85.4 10.9 44-21=23 134-44=90 72

    6 12.1 82.6 6.7 27-21=6 60-27=33 48

    7 11.7 85.4 6.1 27-21=6 66-27=39 48

    XC4010XLPC84-2 Processed with M1.4.12 (PAR L4 D5)

    Design

    Style

    Counter

    Delay

    (nS)

    Max

    Frequency

    (MHz)

    Counter to

    MUX REG

    delay (nS)

    Placement

    time

    (Seconds)

    Routing

    Time

    (Seconds)

    CLBs

    Used

  • 8/6/2019 FPGA Floorplanning

    10/14

    1 18.8 53.2 9.1 63-20=43 199-63=136 55

    2 12.0 83.3 7.7 45-20=25 132-45=87 48

    3 12.2 81.9 6.7 36-21=15 116-36=80 48

    4 11.9 84.0 10.3 30-20=10 97-30=67 56

    5 12.0 83.3 10.5 31-21=10 103-31=72 56

    6 11.6 86.2 6.8 24-20=4 58-24=34 48

    7 11.7 85.4 6.1 24-20=4 61-24=37 48

    XC4010XLPC84-2 Processed with M1.4.12 (PAR L5 D5)

    Design

    Style

    Counter

    Delay

    (nS)

    Max

    Frequency

    (MHz)

    Counter to

    MUX REG

    delay (nS)

    Placement

    time

    (Seconds)

    Routing Time

    (Seconds)

    CLBs

    Used

    1 18.1 55.2 7.7 105-21=84 257-105=152 55

    2 12.0 83.3 6.7 72-21=51 199-72=127 48

    3 11.8 84.7 6.8 55-21=34 138-55=83 48

    4 12.1 82.6 10.5 40-21=19 148-40=108 56

    5 12.1 82.6 10.6 40-20=20 102-40=62 56

    6 12.1 82.6 6.7 29-22=7 61-29=32 48

    7 11.7 85.4 6.1 27-21=6 66-27=39 48

    Interpreting the Floorplan Pictures

    The full manual has all the pictures for all 8 of the above tables of data. This page only

    has the pictures for the last table, Which is the M1 PAR V1.4.12, with -L 5 and -D 5,

    which represent high effort in both placer and router.

  • 8/6/2019 FPGA Floorplanning

    11/14

    At the time of writing this page, the XC4000XL is Xilinx's leading FPGA family, and the

    M1 PAR version 1.4.12 is the current version of the place and route software.

    The color coding of the following Floorplans is as follows:

    All the pictures are of XC4010XL devices, which is an array of 20 by

    20 CLBs. These are represented by small squares. If it is empty, the CLB

    is not used

    Within each CLB, colored squares on the left are F & G function

    generators, colored squares on the right are the flip-flops, and a colored

    rectangle in the middle represents the H function generator.

    If a square is colored blue, then it is being used

    If a square is colored yellow, then it is a function generator, and the

    carry logic is active

    If a square is colored magenta, then it is a function generator, and it

    is being used for single ported RAM

    If a square is colored red, then it is a function generator, and it is

    being used for dual ported RAM If a square is colored green, then it is a function generator, and it is

    being used for ROM

    If an I/O cell is colored red, then it is being used for a global clock

    buffer

    An "X" over an I/O cell indicates an I/O cell that is not bonded to a

    package pin

    An inward pointing arrow on an I/O cell indicates usage as an input

    An outward pointing arrow on an I/O cell indicates usage as an

    output

    If an I/O or CLB cell has a gray background, then it means that there

    was placement control used on that location

    XC4010XL-S1-F

    The 4 counters are binary ripple counters (CB16CE),

    from the Xilinx unified library XC4000E, themultiplexer and output register are also taken from this

    library. There is no Floorplanning in this style, and the

    choice of a ripple counter, while available in the library,

    is a poor choice.

    This is also what you will get from synthesis if it does

    not know about the carry logic in the XC4000 families.

    http://www.fliptronics.com/images/xc4010xl-s1-F.gif
  • 8/6/2019 FPGA Floorplanning

    12/14

    XC4010XL-S2-F

    The 4 counters are binary counters that use the built-in

    carry logic (CC16CE), from the Xilinx unified library

    XC4000E, the multiplexer and output register are also

    taken from this library. While there is no explicitFloorplanning in this style, the counters include internal

    Floorplanning, because the carry logic imposes a columnstructure on the counters.

    This is also what you will get from synthesis if it knows

    about carry logic, but you do not do any Floorplanning.

    While the performance for this style is not too bad for

    this example, when a chip is used at 50% or more, the

    lack of Floorplanning can seriously degrade

    performance, and routing times may become very long.

    XC4010XL-S3-F

    This style adds four RLOC_ORIGIN Floorplanning

    constraints to the style 2 design, placing the four

    counters in adjacent column, and aligning the MSBs of

    the counters (and all other bits).

    The Floorplanning is shown by the gray background to

    the four columns that contain the counters. Since the

    multiplexer is not Floorplanned, it is the CLBs with

    logic in them, but a white background.

    http://www.fliptronics.com/images/xc4010xl-s3-F.gifhttp://www.fliptronics.com/images/xc4010xl-s2-F.gif
  • 8/6/2019 FPGA Floorplanning

    13/14

    XC4010XL-S4-F

    This style replace the un-Floorplanned output register of

    the previous styles with a Floorplanned register, and

    places it in the column to the right of the fourth counter.

    It also is aligned with regard to bit positions.

    Note that the multiplexer logic is still scattered allaround the Floorplanned core. Although there is room in

    the Floorplanned output register CLBs to merge some of

    the multiplexer, the mapper in the current version of the

    M1 software will not do this.

    XC4010XL-S5-F

    This style is like style 4, except the output register is

    placed in the column to the right of the column used for

    the register in style 4.

    This opened up a column for the placer to move themultiplexer into. It looks like half of the 16 bits of

    multiplexer logic have been moved into this area, and

    half are still floating about. Merging the multiplexer into

    the Floorplanned output register CLBs has not happened.

    http://www.fliptronics.com/images/xc4010xl-s5-F.gifhttp://www.fliptronics.com/images/xc4010xl-s4-F.gif
  • 8/6/2019 FPGA Floorplanning

    14/14

    XC4010XL-S6-F

    This style uses a Floorplanned multiplexer and output

    register built by FlibGenmodule generator, and places it

    in the two columns to the right of the fourth counter. The

    odd bit multiplexers and output register flip-flops are inone of these two columns, and the even bits are in the

    other column.

    XC4010XL-S7-F

    This style uses the same components of style 6, but the

    Floorplan has been changed. The first two columns

    contain the first two counters, the next two columns are

    the multiplexer and output register, and the last two

    columns contain the third and fourth counter.

    If you have read this page and found it useful, please send an email [email protected]

    http://www.fliptronics.com/flibgen.htmlhttp://www.fliptronics.com/flibgen.htmlmailto:[email protected]:[email protected]://www.fliptronics.com/images/xc4010xl-s7-F.gifhttp://www.fliptronics.com/images/xc4010xl-s6-F.gifhttp://www.fliptronics.com/flibgen.htmlmailto:[email protected]