The University of Texas at AustinFoil # 1 The University of Texas at AustinEE 382M-8 VLSI-2 Page 1
EE-382M
VLSI–II
Early Design Planning: Front End
Mark McDermott
The University of Texas at AustinFoil # 2 The University of Texas at AustinEE 382M-8 VLSI-2 Page 2
EDP Objectives
• Get designers thinking about physical implementation while doing the architecture design.
• Give designers a procedure to floorplan for high performance circuits.
• Help designers avoid pitfalls that can cause die size growth, timing issues and power distribution problems.
• Provide a starting point for layout by setting various constraints such as block size, feedthrus, power and clock routing.
The University of Texas at AustinFoil # 3 The University of Texas at AustinEE 382M-8 VLSI-2 Page 3
EDP and the Design Flow
Concept
Architecture
Logic
Circuits
Si Debug
uArchitecure
Production
EDP
Front End Development
BackendDesign
Execution
Silicon Ramp
EDP encompasses planning from
architecture to the layout.
Technology Readiness
Layout
The University of Texas at AustinFoil # 4 The University of Texas at AustinEE 382M-8 VLSI-2 Page 4
Chip
Cluster
Unit
Sub-unit
Cells
Logical Physical
RLM lib SDP lib Arrays Arrays
Chip Hierarchy
RLM lib SDP lib
The University of Texas at AustinFoil # 5 The University of Texas at AustinEE 382M-8 VLSI-2 Page 5
Basic Building Blocks
• There 3 types of building blocks used in the implementation of a VLSI chip:– RLM: Random Logic Macros
• Typically synthesized using standard cell library• Layout is done using Automatic Place & Route (APR) tools
– SDP: Structured Data Paths• Typically designed using DP libraries.• Layout is generated using tiling engines.• Routing is done manually or with automated routers.
– Arrays: Memory, Register Files, CAMs, etc.• Can be designed using memory generators. High performance memories
are typically done manually.• Memory generators will produce layout. Custom designed memories will
be done manually.
The University of Texas at AustinFoil # 6 The University of Texas at AustinEE 382M-8 VLSI-2 Page 6
RTL Database
Schematic or Gate level RTL
RLM
• Use RLM library
• Create with logic synthesis
• May “tweak” output by hand
“I want to design control logic…”
StructuredDatapath
(SDP)
• Use any existing cell from the library
• Create with text editor or schematic capture
“I want to design datapath logic…”
LEC
LEC provesequivalenceof RTL andSchematics
Custom
• Create new layout cells
• Create new schematics
• Use new layout cells and schematic in “Datapath” flow to the left
“I want to design an array, complex dynamic gate, etc…”
Cell LibraryAutomatically Generated Low-Level Netlist
Path from RTL to structural netlist
The University of Texas at AustinFoil # 7 The University of Texas at AustinEE 382M-8 VLSI-2 Page 7
Front End EDP Flow
• The front-end activities will include:– Determining the critical timing paths and setting the component
constraints at the top level and the component level. • If the critical path exceeds the timing budget, the logic will have to be re-
designed. Timing will be negotiated among all clusters and the top-level integration team. NOTE: We will NOT re-pipeline the SPARC-T1 Core.
– Doing a detailed power estimation determining the power grid requirements.
– Determining the clocking requirements and designing the clock distribution and regeneration components.
The University of Texas at AustinFoil # 8 The University of Texas at AustinEE 382M-8 VLSI-2 Page 8
Backend EDP Flow
• The project activities will include:– Determining the standard cell and custom library elements needed
to completely do the design with APR tools.– Detailed floor-plan of the block level components.– A reasonably detailed top-level floorplan using the cluster abstracts.– Approximate clock routing at the top-level– Approximate Power-GND routing at the top level
The University of Texas at AustinFoil # 9 The University of Texas at AustinEE 382M-8 VLSI-2 Page 9
Determining Critical Speed Paths
• Random Logic Macro Level:– The primary mechanism for determining the speed paths in
synthesized logic will be using the timing tool in Design Compiler from Synopsys.
– Will still have to manually inspect the synthesis results to confirm that speed paths are real and not an artifact of poor synthesis scripts.
• Structured Data Paths– These paths are determined by a combination of HSPICE and a
standard timing tool like Prime-Time from Synopsys.– For the class project we will rely primarily on HSPICE since we don’t
have a datapath library.• Memory
– Speed paths in custom memory design is done entirely with HSPICE.
– For the class project we will be estimating the delays through the memories and building ATRAT files for Global Timing.
The University of Texas at AustinFoil # 10 The University of Texas at AustinEE 382M-8 VLSI-2 Page 10
Speedpath Analysis
• The frequency of any given processor will be determined by the slowest speedpath.
• In synchronous (i.e. clocked) processors, this is defined as the time necessary to complete the logic in each pipe stage.
• Speedpath components– State element launch time– Logic delay– Wire (RC) delay– State element setup time– Clock Uncertainty
The University of Texas at AustinFoil # 11 The University of Texas at AustinEE 382M-8 VLSI-2 Page 11
Speedpath Analysis
• Look for State Elements in Verilog as endpoints to each speedpath– Flops– Latches– Memory Arrays
• Easiest thing is to follow the clock signal– always @(posedge clk or posedge rst) begin– Note that logic can be imbedded in the always@ statement– Beware of implicit flip-flops in memory arrays.
• Note that speedpaths can traverse many levels of hierarchy and/or many different modules
• Different verilog constructs will translate into different types of logic gates.
The University of Texas at AustinFoil # 12 The University of Texas at AustinEE 382M-8 VLSI-2 Page 12
Speedpath example #1: Verilog Modelalways @(posedge clk or posedge rst) begin
if (rst)id_insn <= #1 {`OR32_NOP, 26'h041_0000};
else if (flushpipe)id_insn <= #1 {`OR32_NOP, 26'h041_0000};
else if (!id_freeze) beginid_insn <= #1 if_insn;
endend…always @(posedge clk or posedge rst) begin
if (rst)shrot_op <= #1 `SHROTOP_NOP;
else if (!ex_freeze & id_freeze | flushpipe)shrot_op <= #1 `SHROTOP_NOP;
else if (!ex_freeze) beginshrot_op <= #1 id_insn[`SHROTOP_POS];
endend
The University of Texas at AustinFoil # 13 The University of Texas at AustinEE 382M-8 VLSI-2 Page 13
Example #2: Synthesized Critical Path (Reg-to-Reg)
Startpoint: ctl/visctl/sub_dff/q_reg[0]Endpoint: dp/rs2_rd_dff/q_reg[31]
Arrival Time: 0.8440Setup Time: 0.1279-----------------------------------Slack: 0.0281
The University of Texas at AustinFoil # 14 The University of Texas at AustinEE 382M-8 VLSI-2 Page 14
Example #3: Synthesized Critical Path (Reg-to-Reg)
Startpoint: ctl/check_ecc_dff/q_reg[0]Endpoint: ctl/possible_ue_dff/q_reg[0]
Arrival Time: 0.5906Setup Time: 0.1285-----------------------------------Slack: 0.2808
The University of Texas at AustinFoil # 15 The University of Texas at AustinEE 382M-8 VLSI-2 Page 15
RLM and SDP Power Estimation
• The power estimates for the RLM and SDP blocks will be done using an Excel spreadsheet instead of the power derived from Design Compiler.
• The spreadsheet comprehends the following contributions to power:– Logic gate intrinsic power– Gate capacitance power– Gate leakage power– Interconnect wiring capacitance power– Source-drain leakage power– Block Activity factors– Signal switching factors– Glitching power
• Line items in the spreadsheet map directly to components in the .lib file.– Entry will be done by extracting gate usage information from
synthesis process.
The University of Texas at AustinFoil # 16 The University of Texas at AustinEE 382M-8 VLSI-2 Page 16
Activity Factor vs. Switching Factor
• Activity Factor represents how often a specific block is acitve.- Represented as percentage of time.- For example an instruction fetch unit is active 80-90% of the time.- A trap unit would be active 2% of the time
- Switching factor is also represented as a percentage and indicates how often the internal nodes of a specific block toggle- A function of the type of gate.
- For example Inverters switch all the time- 4-input NAND gates switch considerably less- Complex gates have even lower switching factors.
- Typical RLM blocks have switching factors of about 15-25% depending on the mix of logic.
The University of Texas at AustinFoil # 17 The University of Texas at AustinEE 382M-8 VLSI-2 Page 17
RLM and SDP Power Estimation Spreadsheet
The University of Texas at AustinFoil # 18 The University of Texas at AustinEE 382M-8 VLSI-2 Page 18
Memory Power Estimation
• Most power dissipation for an array occurs in bitlines and sense amplifiers
• Calculate total bitline capacitance{Metal2 bitline cap} + {junction cap} X {number of bitcells}
• Calculate sense node capacitive load to include in power dissipation
• For power dissipation, we use the approximation:
Pdiss = α * Ctotal * (Vsupply)2 * frequency
Where alpha is the “Activity Factor” 0 < α < 1
• Memory cells can contribute significant D.C. power due to leakage from many cells in standby; be sure to take into account
Pstatic = Ileakage * VDD
The University of Texas at AustinFoil # 19 The University of Texas at AustinEE 382M-8 VLSI-2 Page 19
Total Power Calculations
The University of Texas at AustinFoil # 20 The University of Texas at AustinEE 382M-8 VLSI-2 Page 20
RLM/SDP Block Size Estimation
• The block area estimations are done using the same spread sheet as the power estimation.
• The spreadsheet comprehends the following:– Area utilization factors for each gate type– Block utilization factors
The University of Texas at AustinFoil # 21 The University of Texas at AustinEE 382M-8 VLSI-2 Page 21
RLM/SDP Block Size Estimation Spreadsheet
The University of Texas at AustinFoil # 22 The University of Texas at AustinEE 382M-8 VLSI-2 Page 22
Memory Array Area Estimation
• Cell Area– 1T, 4T, and 6T cell heavily dependent on technology
• Need an actual layout study to determine area– Multiported cells are wire limited and can be easily caclulated
• Cell Height is a function of {MV_Pitch*(Wordlines + Shields)}• Cell Width is a function of {MH_Pitch*(Bitlines + Datalines + Shields)}
• Local Bitline Receivers and Dataline drivers– Height of array is increased by local bitline receivers
• NumReadPorts*NumEntries/CellPerLBL– Height of array is increased by local dataline drivers
• NumWritePorts*NumEntries/CellPerLBL
The University of Texas at AustinFoil # 23 The University of Texas at AustinEE 382M-8 VLSI-2 Page 23
Memory Array Area Estimation
• Decoder & Wordline Repeaters– Width of array is increased by the decoder
• Decoder width is a function of number of ports• 20% of total array width is a reasonable estimate
– Width of array is increased by wordline repeaters• Typically no more than 32 cells on a single wordline
The University of Texas at AustinFoil # 24 The University of Texas at AustinEE 382M-8 VLSI-2 Page 24
Total Area Calculation
• The block area estimates are determined by summing up the RLM/SDP area calculations with the Memory area calculation.