The University of Texas at AustinFoil # 1 The University of Texas at AustinEE 382M-8 VLSI-2 Page 1
EE-382M-8
VLSI–II
Early Design Planning:Back End
Mark McDermott
The University of Texas at AustinFoil # 2 The University of Texas at AustinEE 382M-8 VLSI-2 Page 2
Backend EDP Flow
• The project activities will include:– Determining the standard cell and custom library elements needed
to completely do the design with APR tools.– Detailed floor-plan of the block level components.– A reasonably detailed top-level floorplan using the cluster abstracts.– Approximate clock routing at the top-level– Approximate Power-GND routing at the top level
The University of Texas at AustinFoil # 3 The University of Texas at AustinEE 382M-8 VLSI-2 Page 3
EDP and Layout in the Design Flow
Concept
Architecture
Logic
Circuits
Si Debug
uArchitecure
Production
EDP
Layout
Front End Development
BackendDesign
Execution
Silicon Ramp
EDP encompasses planning from
architecture to the layout.
Technology Readiness
The University of Texas at AustinFoil # 4 The University of Texas at AustinEE 382M-8 VLSI-2 Page 4
Standard Cell Library Effort
• Will be using a very minimal standard cell library for the project: ~80+ cells– Basic logic gates and buffers– 1 set-reset flip-flop
• “CMOS65_SubVt.lib” file was derived using a scaled 65nm .lib file– Need to validate the scaled numbers with HSPICE simulations.– Need to validate power spreadsheet numbers using HSPICE:
• S-D leakage currents• Intrinsic power
– Need to validate area spreadsheet numbers
The University of Texas at AustinFoil # 5 The University of Texas at AustinEE 382M-8 VLSI-2 Page 5
Block Floorplanning Effort
• Objectives:– Minimize area– Determine best shape of the block– Minimize total wire length
• Each team will do a detailed floorplan of their respective blocks. The output will be a spreadsheet analysis showing the contribution from each of the following:– Power grid– Clocking– Signal Routing– Datapath area– Random logic area– White space
The University of Texas at AustinFoil # 6 The University of Texas at AustinEE 382M-8 VLSI-2 Page 6
Integration Effort
• The integration team will be responsible for:– Doing a floor plan of the top level of the chip– Characterizing the top-level routing delays and determining the
assertions and constraints for each cluster. They will be working with each cluster to optimize the constraints.
– Designing the clock routing structure: – Determining the clock generation implementation (block diagrams)– Determining the clock regeneration circuitry (block diagrams)– Determining the reset logic. – Designing the power grid.– Determining the power estimation for the global clock and signal
routing.– Generating the power budget for each cluster.– Generating the area budget for each cluster.
The University of Texas at AustinFoil # 7 The University of Texas at AustinEE 382M-8 VLSI-2 Page 7
Layout Implementation Options
SPARC-T1
The University of Texas at AustinFoil # 8 The University of Texas at AustinEE 382M-8 VLSI-2 Page 8
Layout Density & Die Size = Performance
• Higher density layout leads to smaller block sizes
• Smaller block sizes lead to shorter wires
• Shorter wires can lead to higher frequency
• Shorter wires can also lead to higher IPC by requiring fewer transmission pipe stages
Layout #1
Layout #2
A B’
A
C
CB
A C
Schematic
Floorplan
B’
The layout of Block B affects the
timing of the path from A to C
The University of Texas at AustinFoil # 9 The University of Texas at AustinEE 382M-8 VLSI-2 Page 9
Layout Implementation Options
• Synthesis – Random Logic Macro (RLM)– Cell layout comes from a shared cell library– Automated cell selection and placement– Automated routing between cells
• Structured Custom (SC/SDP)– Cell layout comes from a shared cell library– Manual cell selection and placement– Automated routing between cells
• Custom Design (CD)– Cell layout is unique for each application– Manual cell selection and placement– Manual routing between cells
Increasing
Design Effort
(And Density)
The University of Texas at AustinFoil # 10 The University of Texas at AustinEE 382M-8 VLSI-2 Page 10
Layout Implementation Options
CD SC RLMARTL Coding M M M
Logic Minimization M M ACell Placement M M A
Device Sizing M A ALayout M A A
A = Automatic
M = Manual
CD SC RLMTiming Best Better WorstDensity Best Better Worst
Design Time Worst Better Best
• RLM saves time in circuit design and layout
• SC saves time in layout.
• RLM and SDP make revisions easier.
The University of Texas at AustinFoil # 11 The University of Texas at AustinEE 382M-8 VLSI-2 Page 11
Datapath and Block Floorplanning Procedures
MIPS R10K
The University of Texas at AustinFoil # 12 The University of Texas at AustinEE 382M-8 VLSI-2 Page 12
Datapath and Block Floorplanning Procedure
• Step 1 - Identify feedthrus for RLM or SC/DP block• Step 2 - Look for opportunities for track sharing• Step 3 - Define the bitpitch of the block• Step 4 - Review the metal plan within the cell • Step 5 - Review and plan the clock routing and placement• Step 6 - Plan the critical cell placement locations• Step 7 - Estimate the area of the cells and the block• Step 8 - Review the power grid
The University of Texas at AustinFoil # 13 The University of Texas at AustinEE 382M-8 VLSI-2 Page 13
Feed-through or Over-the-cell (OTC) Routes
• Metal tracks routed over RLM, Datapath or custom block • The block is neither the driver or a receiver of the signals• Feedthrus use up metal tracks which impacts the internal
signals of the block• Carefully review datapath connectivity to account for them
Bypass
ALU 0
ALU 1
ALU 2
Sources Results
ReceiverDriver
Feedthrus
for ALU0
The University of Texas at AustinFoil # 14 The University of Texas at AustinEE 382M-8 VLSI-2 Page 14
Datapath and Block Floorplanning Procedure
• Step 1 - Identify feedthrus• Step 2 - Look for opportunities for track sharing• Step 3 - Define the bitpitch of the block• Step 4 - Review the metal plan within the cell • Step 5 - Review and plan the clock routing and placement• Step 6 - Plan the critical cell placement locations• Step 7 - Estimate the area of the cells and the block• Step 8 - Review the power grid
The University of Texas at AustinFoil # 15 The University of Texas at AustinEE 382M-8 VLSI-2 Page 15
Step 2: Track Sharing
• Minimizes the number of unique tracks in layout by opportunistically sharing tracks where possible
• Often allows for the smallest possible bitpitch
• Allows for metal layers to be more efficiently utilized
• Can help improve performance by shortening distances
• Should always be explored to improve layout efficiency and performance
The University of Texas at AustinFoil # 16 The University of Texas at AustinEE 382M-8 VLSI-2 Page 16
Step 2: Track Sharing
Bypass$
ALU 0
ALU 1
ALU 2
Sources Results
First, check outside your
block to see if there
are any candidates
for track sharing
ReceiverDriver
The University of Texas at AustinFoil # 17 The University of Texas at AustinEE 382M-8 VLSI-2 Page 17
Step 2: Track Sharing
Next, check inside your
block to see if there
are any candidates
for track sharing
LRBL<11:0> RRBL<11:0>
IE_BYC_DATA<11:0> IE_RF_DATA<11:0>
Metal 2Metal 4
The University of Texas at AustinFoil # 18 The University of Texas at AustinEE 382M-8 VLSI-2 Page 18
Step 2: Track Sharing Example
The University of Texas at AustinFoil # 19 The University of Texas at AustinEE 382M-8 VLSI-2 Page 19
Step 2: Track Sharing Example
The University of Texas at AustinFoil # 20 The University of Texas at AustinEE 382M-8 VLSI-2 Page 20
Datapath and Block Floorplanning Procedure
• Step 1 - Identify feedthrus• Step 2 - Look for opportunities for track sharing• Step 3 - Define the bitpitch of the block• Step 4 - Review the metal plan within the cell • Step 5 - Review and plan the clock routing and placement• Step 6 - Plan the critical cell placement locations• Step 7 - Estimate the area of the cells and the block• Step 8 - Review the power grid
The University of Texas at AustinFoil # 21 The University of Texas at AustinEE 382M-8 VLSI-2 Page 21
Bit Pitch Defining Width of Chip
AMD K5
The University of Texas at AustinFoil # 22 The University of Texas at AustinEE 382M-8 VLSI-2 Page 22
Step 3: Define the Bitpitch
• Fixed cell width chosen to allow easy assembly
• Most often determined by metal usage within the datapath
• Integration efficiency would prefer one bitpitch per project
• Architectures lend themselves to more unique bit pitches
Bitpitch A<4>
A<3>
A<2>
A<1>
A<0>
VddSig0 <4>Sig1 <4>Sig2 <4>Sig3 <4>Sig4 <4>Sig5 <4>
Vss
VddSig0 <1>Sig1 <1>Sig2 <1>Sig3 <1>Sig4 <1>Sig5 <1>
Vss
The University of Texas at AustinFoil # 23 The University of Texas at AustinEE 382M-8 VLSI-2 Page 23
Step 3: Define the Bitpitch
Insure all blocks in a datapath stack follow the same bitpitchB
itpitc
h #2
Yµ
Byp
ass C
ache
Inte
ger
Reg
iste
r
File
AL
U 0
AL
U 1
Ari
th F
lags
AG
EN
-L
D /
STA
Shift
er
WB
Mux
Bit
Ops
Syst
em U
ops
Bitp
itch
#1 X
µ
The University of Texas at AustinFoil # 24 The University of Texas at AustinEE 382M-8 VLSI-2 Page 24
Bit Pitch Example: 3:2 Adder Bit Cell
Bitpitch
7.56u
M1
M4
M3 & M1
The University of Texas at AustinFoil # 25 The University of Texas at AustinEE 382M-8 VLSI-2 Page 25
Bit Pitch Example: 4 Bit Cells stacked
Bitpitch
7.56u BIT - 0
BIT - 1
BIT - 2
BIT - 4
The University of Texas at AustinFoil # 26 The University of Texas at AustinEE 382M-8 VLSI-2 Page 26
Bit Pitch Example: Tiled Datapath
The University of Texas at AustinFoil # 27 The University of Texas at AustinEE 382M-8 VLSI-2 Page 27
Bit Pitch Example: Swizzle
Don’t mix and match bit pitches to avoid swizzle channels
As buses get wider and the number of tracks per
bit gets higher the cost of swizzle channels grows
Swizzle
Channel
The University of Texas at AustinFoil # 28 The University of Texas at AustinEE 382M-8 VLSI-2 Page 28
Step 3: Define the Bitpitch
• Wider bit pitches allow more upper level metal usage
• Narrower bit pitches allow shorter routes for orthogonal signals
• Balancing these conflicting objectives can be difficult
• Understand your local constraints and be aware of the tradeoffs
The University of Texas at AustinFoil # 29 The University of Texas at AustinEE 382M-8 VLSI-2 Page 29
Datapath and Block Floorplanning Procedure
• Step 1 - Identify feedthrus• Step 2 - Look for opportunities for track sharing• Step 3 - Define the bitpitch of the block• Step 4 - Review the metal plan within the cell • Step 5 - Review and plan the clock routing and placement• Step 6 - Plan the critical cell placement locations• Step 7 - Estimate the area of the cells and the block• Step 8 - Review the power grid
The University of Texas at AustinFoil # 30 The University of Texas at AustinEE 382M-8 VLSI-2 Page 30
Metal Planning
• Metal layer, width, spacing and shielding are negotiable– “Negotiable” means you have to plead your case to the integration
leaders
• All of these impose a physical constraint for layout
• For your first attempt at convergence– M1,M2 : Local routing– M3,M4, M5, M6 : Data and control– M7,M8 : Power, Ground, Clock, Reset, etc– Assume all nets are routed in M1&M2 within your block– Assume your only shielding is on clocks and reset– Assume the routes are minimum
The University of Texas at AustinFoil # 31 The University of Texas at AustinEE 382M-8 VLSI-2 Page 31
Metal Flow Planning
Avoid bi-directional dataflow
BAD GOOD
Data
Cntl
Data
Cntl
Data
The University of Texas at AustinFoil # 32 The University of Texas at AustinEE 382M-8 VLSI-2 Page 32
Shielding
• Intentionally routing signals to control the effective line-to-line capacitance seen during switching.
• Requires designers to constrain the physical assembly done by routing tools or physical design specialists (PDSs).
• Falls into one of three categories:– Physical shielding - signals are routed next to a power rail– Logical shielding - signals are routed by logically related signals– Temporal shielding - signals are routed by temporally distinct
signals
The University of Texas at AustinFoil # 33 The University of Texas at AustinEE 382M-8 VLSI-2 Page 33
Miller Coupling Factor
A
B
C
A
B
C
A
B
C
A
B
C
A
B
C
MCF = 1.5 One against, one quietMCF = 2.0 Both against
MCF = 0.5 One with, one quietMCF = 1.0 Both quiet MCF = 0.0 Both with
The University of Texas at AustinFoil # 34 The University of Texas at AustinEE 382M-8 VLSI-2 Page 34
No Shielding
• Signals are routed next to any neighboring signals• Neighbors can slow down (max delay) or speed up (min delay)
signal transitions through line-to-line coupling• Variation can create design problems• Most signals will not be shielded
Sig A Sig B Sig C
No Shield Max MCF 2.0 Min MCF 0.0
A
B
C
A
B
C
The University of Texas at AustinFoil # 35 The University of Texas at AustinEE 382M-8 VLSI-2 Page 35
Physical Shielding
• Signals are routed next to at least one power rail• Helps both min delay and max delay• Can be expensive in terms of metal usage• Typically limited to most critical nets and clocks
Vss Sig A Sig B Vss Sig A Vss
Half Shield Full Shield
Max MCF 1.5
Min MCF 0.5
Max MCF 1.0
Min MCF 1.0
The University of Texas at AustinFoil # 36 The University of Texas at AustinEE 382M-8 VLSI-2 Page 36
Logical Shielding
• Signals are routed next to mutually exclusive neighbors• Also helps min delay and max delay• Comparable results as physical shielding but lesser cost• Encouraged in mux structures and arrays
Sel A Sel B Sel C
A
B
C
Sel A
Sel B
Sel C
Sel A
Sel B
Sel C
Max MCF 1.5
Min MCF 1.0
The University of Texas at AustinFoil # 37 The University of Texas at AustinEE 382M-8 VLSI-2 Page 37
Temporal Shielding
• Signals are routed next to signals that limit aggressors• Can help max delay or min delay or both• Lesser cost than physical shielding, but more design effort• Encouraged wherever possible but tricky
A
B
C
A
B
C
Max MCF 1.0
Min MCF 0.0
Ck
Ck
Ck
Sig A Sig B Sig C
The University of Texas at AustinFoil # 38 The University of Texas at AustinEE 382M-8 VLSI-2 Page 38
Shielding Gotcha
• Tools may rely on the designer to override the default coupling assumptions
L
L
Ck
Ck
A
B
Max MCF 2.0
Min MCF 0.0
If you need temporal shielding to make your
circuit meet timing, your circuit doesn’t
meet timing. Do not rely on it.
The University of Texas at AustinFoil # 39 The University of Texas at AustinEE 382M-8 VLSI-2 Page 39
Datapath and Block Floorplanning Procedure
• Step 1 - Identify feedthrus• Step 2 - Look for opportunities for track sharing• Step 3 - Define the bitpitch of the block• Step 4 - Review the metal plan within the cell • Step 5 - Review and plan the clock routing and placement• Step 6 - Plan the critical cell placement locations• Step 7 - Estimate the area of the cells and the block• Step 8 - Review the power grid
The University of Texas at AustinFoil # 40 The University of Texas at AustinEE 382M-8 VLSI-2 Page 40
Variations of Clock Tree distribution networks
Tapered H-Tree
Target: Metallization and Gate topology uniformity
The University of Texas at AustinFoil # 41 The University of Texas at AustinEE 382M-8 VLSI-2 Page 41
Clock Routing
• Watch out for the clock, it’s your most critical net• Make sure the physical design treats it accordingly• Help reduce clock power by eliminating unnecessary load• Make sure the clock has enough via coverage• Leave room for decoupling capacitors and upsizing• Don’t forget to account for clock routing overhead (full shield) in
your metal planning
The University of Texas at AustinFoil # 42 The University of Texas at AustinEE 382M-8 VLSI-2 Page 42
Clock Routing
BAD GOOD
UNNECESSARY
LOAD
Avoid unnecessary clock load to save active power
The University of Texas at AustinFoil # 43 The University of Texas at AustinEE 382M-8 VLSI-2 Page 43
Power/Clock Grid• Clock grid is interleaved between VDD and VSS on metal6
Port1 Input Data LatchLCB
LCB
Port0 Input Data Latch LCB
LCB
Port0 Read/Write CktLCB
Port0 Output LatchLCB
LCB
Port1 Output LatchLCB
Port1 Read/Write Ckt
LCB
LCB
LCB
LCB
BitcellArray
Port1 Input Data LatchLCB
LCB
Port0 Input Data LatchLCB
LCB
Port0 Read/Write Ckt LCB
BitcellArray
Port0 D
ecoderLCB
LCB
Port0 Output Latch LCB
LCBPort1 Output LatchPort1 Read/Write Ckt
LCB
LCB
LCB
LCB
LCB
LCB
LCB
Port0 Read/Write CktP
ort1 Decoder
The University of Texas at AustinFoil # 44 The University of Texas at AustinEE 382M-8 VLSI-2 Page 44
Clock Routing
Make sure there are enough vias to get power through
the clock network
INSUFFICIENT
VIA COVERAGE
SUFFICIENT
VIA COVERAGE
The University of Texas at AustinFoil # 45 The University of Texas at AustinEE 382M-8 VLSI-2 Page 45
Clock Routing
Remember to count clocks as ~5-7 tracks in your
wire planning!
Vdd Clock Vss
Be careful with gated clocks. Fine grain
clock gating tends to drastically increase
the number of unique clocks, significantly
increasing the metal usage.
No tools catch this before layout
1x 2x 1 x
1.5x 1.5x
The University of Texas at AustinFoil # 46 The University of Texas at AustinEE 382M-8 VLSI-2 Page 46
Datapath and Block Floorplanning Procedure
• Step 1 - Identify feedthrus• Step 2 - Look for opportunities for track sharing• Step 3 - Define the bitpitch of the block• Step 4 - Review the metal plan within the cell • Step 5 - Review and plan the clock routing and placement• Step 6 - Plan the critical cell placement locations• Step 7 - Estimate the area of the cells and the block• Step 8 - Review the power grid
The University of Texas at AustinFoil # 47 The University of Texas at AustinEE 382M-8 VLSI-2 Page 47
Cell Placement
• Start with the critical path!– Place cells to limit the wire load on the critical path– Move less critical blocks out of the way
• Place clock generators to limit clock wire load– Again, place most critical clock LCBs first if area is tight– Ideally there should be minimal side loads
• Consider track sharing opportunities when placing cells– Cell placement can enable or disable track sharing– Optimum placement generally follows data flow
The University of Texas at AustinFoil # 48 The University of Texas at AustinEE 382M-8 VLSI-2 Page 48
Cell Placement
LCB
Short
critical
path
No side
load
The University of Texas at AustinFoil # 49 The University of Texas at AustinEE 382M-8 VLSI-2 Page 49
Cell Placement and Routing
The University of Texas at AustinFoil # 50 The University of Texas at AustinEE 382M-8 VLSI-2 Page 50
Datapath and Block Floorplanning Procedure
• Step 1 - Identify feedthrus• Step 2 - Look for opportunities for track sharing• Step 3 - Define the bitpitch of the block• Step 4 - Review the metal plan within the cell • Step 5 - Review and plan the clock routing and placement• Step 6 - Plan the critical cell placement locations• Step 7 - Estimate the area of the cells and the block• Step 8 - Review the power grid
The University of Texas at AustinFoil # 51 The University of Texas at AustinEE 382M-8 VLSI-2 Page 51
Area Estimation
• All modules have an area budget in the floorplan
• That budget is only an educated guess
• Some guesses are high, and some are low
• You will need to enhance the quality of these estimates by more accurately estimating the area of your modules
• While doing this you will reduce the amount of late surprises in the design and also reduce post-layout effort by converging with accurate parasitics
The University of Texas at AustinFoil # 52 The University of Texas at AustinEE 382M-8 VLSI-2 Page 52
Area Estimation
• Custom cell area can be set in one of three ways– Device limited layout means the device sizes set the cell area– Metal limited layout means the wires set the cell area– Pitch-matching means the cell area is set to match another cell
• Your first job is to figure out which your cell is – Datapaths are metal limited in one direction (bitpitch)– Arrays often are metal limited in both directions– Control blocks often match a datapath or array
The University of Texas at AustinFoil # 53 The University of Texas at AustinEE 382M-8 VLSI-2 Page 53
Die Size Estimation
The University of Texas at AustinFoil # 54 The University of Texas at AustinEE 382M-8 VLSI-2 Page 54
Datapath and Block Floorplanning Procedure
• Step 1 - Identify feedthrus• Step 2 - Look for opportunities for track sharing• Step 3 - Define the bitpitch of the block• Step 4 - Review the metal plan within the cell • Step 5 - Review and plan the clock routing and placement• Step 6 - Plan the critical cell placement locations• Step 7 - Estimate the area of the cells and the block• Step 8 - Review the power grid
The University of Texas at AustinFoil # 55 The University of Texas at AustinEE 382M-8 VLSI-2 Page 55
Power Grid
• Delivers current from the C4 bumps to the transistors• Designed to deliver typical current density to the devices• Increasing current density by arraying large devices can cause
you to exceed the power grid’s nominal design• Doing this can cause performance and noise problems
The University of Texas at AustinFoil # 56 The University of Texas at AustinEE 382M-8 VLSI-2 Page 56
Power Grid
Think of the grid as a straw
between the C4 and the devices.
Too many devices sucking through
the same straw or too narrow a
straw can cause devices to starve
and the supply to dip or crater!
The University of Texas at AustinFoil # 57 The University of Texas at AustinEE 382M-8 VLSI-2 Page 57
SAMPLE Power/Ground GRID
Shielding takes up significant routing resources.Global M6 routes over the array should have minimal coupling noise to array bitlines.
* Where λ is minimum critical dimension for width/space
Sig
Sig
Si g
Sig
VSS VDD VSSS
ig
48λ
Sig
Vss
Vss
Vss
Vss
(Full Shielding, MCF = 1.0)
2λ
4λ
2λ
λ
2λ2λ
λ
2λ
The University of Texas at AustinFoil # 58 The University of Texas at AustinEE 382M-8 VLSI-2 Page 58
Power Grid
SCHEMATIC
VIEW
CELL LAYOUT
VIEW
RELATIVE CELL
PLACEMENTA
Bit 31
Bit 0
A<31:0>
A <31:0>
OUT
<31:0>
The University of Texas at AustinFoil # 59 The University of Texas at AustinEE 382M-8 VLSI-2 Page 59
Power Grid
A <31:0>
SCHEMATIC
VIEW
CELL LAYOUT
VIEW
RELATIVE CELL
PLACEMENTA
Bit 31
Bit 0
A<31:0>
When large, arrayed drivers pull
on the same rail, supply bounce
can occur degrading performance
and causing supply offset noise
OUT
<31:0>
Out
Current
Vdd
Vss
The University of Texas at AustinFoil # 60 The University of Texas at AustinEE 382M-8 VLSI-2 Page 60
Power Grid
• Be very careful arraying large drivers • Follow the % power guidelines for the power grid• Try to keep temporal relationships between arrayed drivers• Consider the physical impact on the grid by your design• Be prepared to make the grid more robust to compensate for
marginal grids
The University of Texas at AustinFoil # 61 The University of Texas at AustinEE 382M-8 VLSI-2 Page 61
Summary
• Early design planning and layout can have a significant impact on processor design– Die size, profit & power are impacted by layout density– Schedule is impacted by implementation choices
• Floorplanning also significantly impacts circuit performance – Shielding can help timing and noise sensitive circuits– Carefully floorplanning critical paths can help reduce wire loads– Reducing clock routing can reduce clock skew and clock power
The University of Texas at AustinFoil # 62 The University of Texas at AustinEE 382M-8 VLSI-2 Page 62
Backup
The University of Texas at AustinFoil # 63 The University of Texas at AustinEE 382M-8 VLSI-2 Page 63
Wire and Resistance Calculator
The University of Texas at AustinFoil # 64 The University of Texas at AustinEE 382M-8 VLSI-2 Page 64
ALPHA 21364
The University of Texas at AustinFoil # 65 The University of Texas at AustinEE 382M-8 VLSI-2 Page 65
PPC 603