Download - EE-382M VLSI–II Early Design Planning: Front Endusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_2.pdfFront End EDP Flow • The front-end activities will include: – Determining the

The University of Texas at AustinFoil # 1 The University of Texas at AustinEE 382M-8 VLSI-2

EE-382M

VLSI–II

Early Design Planning: Front End

Mark McDermott


EDP Objectives

• Get designers thinking about physical implementation while doing the architecture design.

• Give designers a procedure to floorplan for high performance circuits.

• Help designers avoid pitfalls that can cause die size growth, timing issues and power distribution problems.

• Provide a starting point for layout by setting various constraints such as block size, feedthrus, power and clock routing.


EDP and the Design Flow

Concept

Architecture

Logic

Circuits

Si Debug

uArchitecure

Production

EDP

Front End Development

BackendDesign

Execution

Silicon Ramp

EDP encompasses planning from

architecture to the layout.

Technology Readiness

Layout


Chip

Cluster

Unit

Sub-unit

Cells

Logical Physical

RLM lib SDP lib Arrays Arrays

Chip Hierarchy

RLM lib SDP lib


Basic Building Blocks

• There 3 types of building blocks used in the implementation of a VLSI chip:– RLM: Random Logic Macros

• Typically synthesized using standard cell library• Layout is done using Automatic Place & Route (APR) tools

– SDP: Structured Data Paths• Typically designed using DP libraries.• Layout is generated using tiling engines.• Routing is done manually or with automated routers.

– Arrays: Memory, Register Files, CAMs, etc.• Can be designed using memory generators. High performance memories

are typically done manually.• Memory generators will produce layout. Custom designed memories will

be done manually.


RTL Database

Schematic or Gate level RTL

RLM

• Use RLM library

• Create with logic synthesis

• May “tweak” output by hand

“I want to design control logic…”

StructuredDatapath

(SDP)

• Use any existing cell from the library

• Create with text editor or schematic capture

“I want to design datapath logic…”

LEC

LEC provesequivalenceof RTL andSchematics

Custom

• Create new layout cells

• Create new schematics

• Use new layout cells and schematic in “Datapath” flow to the left

“I want to design an array, complex dynamic gate, etc…”

Cell LibraryAutomatically Generated Low-Level Netlist

Path from RTL to structural netlist


Front End EDP Flow

• The front-end activities will include:– Determining the critical timing paths and setting the component

constraints at the top level and the component level. • If the critical path exceeds the timing budget, the logic will have to be re-

designed. Timing will be negotiated among all clusters and the top-level integration team. NOTE: We will NOT re-pipeline the SPARC-T1 Core.

– Doing a detailed power estimation determining the power grid requirements.

– Determining the clocking requirements and designing the clock distribution and regeneration components.


Backend EDP Flow

• The project activities will include:– Determining the standard cell and custom library elements needed

to completely do the design with APR tools.– Detailed floor-plan of the block level components.– A reasonably detailed top-level floorplan using the cluster abstracts.– Approximate clock routing at the top-level– Approximate Power-GND routing at the top level


Determining Critical Speed Paths

• Random Logic Macro Level:– The primary mechanism for determining the speed paths in

synthesized logic will be using the timing tool in Design Compiler from Synopsys.

– Will still have to manually inspect the synthesis results to confirm that speed paths are real and not an artifact of poor synthesis scripts.

• Structured Data Paths– These paths are determined by a combination of HSPICE and a

standard timing tool like Prime-Time from Synopsys.– For the class project we will rely primarily on HSPICE since we don’t

have a datapath library.• Memory

– Speed paths in custom memory design is done entirely with HSPICE.

– For the class project we will be estimating the delays through the memories and building ATRAT files for Global Timing.


Speedpath Analysis

• The frequency of any given processor will be determined by the slowest speedpath.

• In synchronous (i.e. clocked) processors, this is defined as the time necessary to complete the logic in each pipe stage.

• Speedpath components– State element launch time– Logic delay– Wire (RC) delay– State element setup time– Clock Uncertainty


Speedpath Analysis

• Look for State Elements in Verilog as endpoints to each speedpath– Flops– Latches– Memory Arrays

• Easiest thing is to follow the clock signal– always @(posedge clk or posedge rst) begin– Note that logic can be imbedded in the always@ statement– Beware of implicit flip-flops in memory arrays.

• Note that speedpaths can traverse many levels of hierarchy and/or many different modules

• Different verilog constructs will translate into different types of logic gates.


Speedpath example #1: Verilog Modelalways @(posedge clk or posedge rst) begin

if (rst)id_insn <= #1 {`OR32_NOP, 26'h041_0000};

else if (flushpipe)id_insn <= #1 {`OR32_NOP, 26'h041_0000};

else if (!id_freeze) beginid_insn <= #1 if_insn;

endend…always @(posedge clk or posedge rst) begin

if (rst)shrot_op <= #1 `SHROTOP_NOP;

else if (!ex_freeze & id_freeze | flushpipe)shrot_op <= #1 `SHROTOP_NOP;

else if (!ex_freeze) beginshrot_op <= #1 id_insn[`SHROTOP_POS];

endend


Example #2: Synthesized Critical Path (Reg-to-Reg)

Startpoint: ctl/visctl/sub_dff/q_reg[0]Endpoint: dp/rs2_rd_dff/q_reg[31]

Arrival Time: 0.8440Setup Time: 0.1279-----------------------------------Slack: 0.0281


Example #3: Synthesized Critical Path (Reg-to-Reg)

Startpoint: ctl/check_ecc_dff/q_reg[0]Endpoint: ctl/possible_ue_dff/q_reg[0]

Arrival Time: 0.5906Setup Time: 0.1285-----------------------------------Slack: 0.2808


RLM and SDP Power Estimation

• The power estimates for the RLM and SDP blocks will be done using an Excel spreadsheet instead of the power derived from Design Compiler.

• The spreadsheet comprehends the following contributions to power:– Logic gate intrinsic power– Gate capacitance power– Gate leakage power– Interconnect wiring capacitance power– Source-drain leakage power– Block Activity factors– Signal switching factors– Glitching power

• Line items in the spreadsheet map directly to components in the .lib file.– Entry will be done by extracting gate usage information from

synthesis process.


Activity Factor vs. Switching Factor

• Activity Factor represents how often a specific block is acitve.- Represented as percentage of time.- For example an instruction fetch unit is active 80-90% of the time.- A trap unit would be active 2% of the time

- Switching factor is also represented as a percentage and indicates how often the internal nodes of a specific block toggle- A function of the type of gate.

- For example Inverters switch all the time- 4-input NAND gates switch considerably less- Complex gates have even lower switching factors.

- Typical RLM blocks have switching factors of about 15-25% depending on the mix of logic.


RLM and SDP Power Estimation Spreadsheet


Memory Power Estimation

• Most power dissipation for an array occurs in bitlines and sense amplifiers

• Calculate total bitline capacitance{Metal2 bitline cap} + {junction cap} X {number of bitcells}

• Calculate sense node capacitive load to include in power dissipation

• For power dissipation, we use the approximation:

Pdiss = α * Ctotal * (Vsupply)2 * frequency

Where alpha is the “Activity Factor” 0 < α < 1

• Memory cells can contribute significant D.C. power due to leakage from many cells in standby; be sure to take into account

Pstatic = Ileakage * VDD


Total Power Calculations


RLM/SDP Block Size Estimation

• The block area estimations are done using the same spread sheet as the power estimation.

• The spreadsheet comprehends the following:– Area utilization factors for each gate type– Block utilization factors


RLM/SDP Block Size Estimation Spreadsheet


Memory Array Area Estimation

• Cell Area– 1T, 4T, and 6T cell heavily dependent on technology

• Need an actual layout study to determine area– Multiported cells are wire limited and can be easily caclulated

• Cell Height is a function of {MV_Pitch*(Wordlines + Shields)}• Cell Width is a function of {MH_Pitch*(Bitlines + Datalines + Shields)}

• Local Bitline Receivers and Dataline drivers– Height of array is increased by local bitline receivers

• NumReadPorts*NumEntries/CellPerLBL– Height of array is increased by local dataline drivers

• NumWritePorts*NumEntries/CellPerLBL


Memory Array Area Estimation

• Decoder & Wordline Repeaters– Width of array is increased by the decoder

• Decoder width is a function of number of ports• 20% of total array width is a reasonable estimate

– Width of array is increased by wordline repeaters• Typically no more than 32 cells on a single wordline


Total Area Calculation

• The block area estimates are determined by summing up the RLM/SDP area calculations with the Memory area calculation.

Download - EE-382M VLSI–II Early Design Planning: Front Endusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_2.pdfFront End EDP Flow • The front-end activities will include: – Determining the

Top Related