tkt-9627 digital and computer systems seminar

28
JET 2006 TKT-9627 Digital and Computer Systems Seminar 1 TKT-9627 Digital and Computer Systems Seminar Closing the Gap Between ASIC & Custom

Upload: lakia

Post on 11-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

TKT-9627 Digital and Computer Systems Seminar. Closing the Gap Between ASIC & Custom. Course content. Literature: David Chinney and Kurt Keutzer: “Closing the Gap Between ASIC & Custom” Kluwer 2002, 407 p., ISBN 1402071132 Tools and Techniques for High-Performance ASIC Design Topics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 1

TKT-9627 Digital and Computer Systems Seminar

Closing the Gap BetweenASIC & Custom

Page 2: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 2

Course content

Literature:David Chinney and Kurt Keutzer:“Closing the Gap Between ASIC & Custom”Kluwer 2002, 407 p., ISBN 1402071132

– Tools and Techniques for High-Performance ASIC Design Topics

– Improving performance through microarchitecture– Timing-driven floorplanning– Controlling and exploiting clock skew– High performance latch-based design in an ASIC methodology– Automatically identifying and synthesizing complex logic gates– Automatic cell sizing to increase performance and reduce power– Controlling process variation

Page 3: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 3

Course Info and Requirements

Time and place: Wed 14-16, TC165 Instructors: Jouni Tomberg and Olli Vainio

Requirements:– Presentation of the course book topic– Active participation (>60%)– Exam

Page 4: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 4

Table of Contents (Book chapters)

1. Introduction and Overview of the Book (32 p.)

Contributing Factors2. Improving Performance through Microarchitecture (24 p.)3. Reducing the Timing Overhead (44 p.)4. High Speed Logic, Circuits, Libraries and Layout (44 p.)5. Finding Peak Performance in a Process (24 p.)

Page 5: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 5

Table of Contents …

Design Techniques6. Physical Prototyping Plans for High Performance (19 p.)7. Automatic Replacement of Flip-Flops by Latches in ASICs (22 p.)8. Useful-Skew Clock Synthesis Boosts ASIC Performance (16 p.)9. Faster and Lower Power Cell-Based Designs with Transistor-Level Cell Sizing

(16 p.)10. Design Optimization with Automated Flex-Cell Creation (28 p.)11. Exploiting Structure and Managing Wires to Increase Density and Performance

(20 p.)12. Semi-Custom Methods in a High-Performance Microprocessor Design (16 p.)13. Controlling Uncertainty in High Frequency Designs (18 p.)14. Increasing Circuit Performnace through Statistical Design Techniques (22 p.)

Page 6: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 6

Table of Contents …

Design Examples15. Achieving 550MHz in a Standard Cell ASIC Methodolgy (16 p.)16. The iCORE 520MHz Synthesizable CPU Core (22 p.)17. Creating Synthesizable ARM Processor with Near Custom

Performance (25 p.)

Page 7: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 7

Introduction (Book chapter 1)

Why are custom circuits so much faster?– Routinely 3x to 8x faster when fabricated in the same same process

generation.– The first aim is to explain this disparity in performance.– The second aim is to understand practical ways in which the performance gap

can be bridged. Who should care?

– ASIC and ASSP designers seeking high performance– Custom designers seeking higher productivity– EDA tool developers and researchers

Definitions in this book– ASIC design methodology, standard cell library based design, netlist handoff– Custom-design methodology, layout handoff

Page 8: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 8

A Quick Comparison …

•Process technologies vary in a number of ways: the channel legth, interconnect density and material, etc.

Page 9: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 9

…Based on These Figures

•Most of the custom designs use dynamic logic on critical paths and more pipeline stages to achieve higher speeds.

Page 10: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 10

Adding Up the Numbers

Page 11: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 11

Improving Areas

Microarchitecture Timing overhead: clock tree design and registers Logic style Logic design Cell design and wire sizing Layout: Floorplanning and placement to manage wires Process variation and improvement

Page 12: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 12

Microarchitecture (1)

Organization of functional units; number, hierarchy, interfaces, pipeline, number of computational clock cycles, logic for branch prediction and data forwarding

What’s the problem?– Pipelining in ASICs is limited by the larger timing overhead for the registers in the

pipeline.– Custom designs may show superior logic-level design of regular structures such as

adders, multipliers etc. Achieving fewer levels of logic and combining logic with registers.

– In custom designs the pipeline stage balancing can be done more effectively.– ASIC microprocessors tend to have simpler implementations of speculative executions

to reduce the design time.– It is difficult to estimate the precise performance improvement with microarchitectural

changes.– The overheads for pipelining are the register delays, larger impact of clock skew and

clock jitter and unbalanced pipeline stages leading to about 30% overhead for an ASIC design.

Page 13: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 13

Microarchitecture (2)

What can be done?– Use the best pipelining practices .– ASICs are unable to have the same tight control of the combinational delay

and the timing overhheads. Thus the estimation is a factor of 1.3x between best ASIC and custom implementations.

In the book– Chapter 2 provides a tutorial intro to microarchitecture of ASICs– The design examples in chapters 15-17 give examples of developing

efficient microarchitecture for ASICs

Page 14: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 14

Timing Overhead: Clock Tree Design and Registers (1)

The timing overhead is the additional delay associated with pipeline registers and the arrival of the clock edge.

– It consists of the delay through the registers, the setup time of the registers, the clock skew between arrival of the same clock edge at the different points of the chip and the jitter between the arrival of the consecutive clock edges at the same point of the chip.

What’s the problem?– ASICs have larger clock skew and clock jitter and slower registers than custom designs.– ASIC designers try to avoid races and increase tolerance to noise, as they have far less control

of variation. Thus they design the circuitry to work for the range of possible conditions.– ASICs primarly use edge-triggered flip-flops and high speed registers are not supported in the

cell libraries.– Tight control of the layout allows hold time violations and noise to be avoided. Also custom

designs may run long wires with shielding wires or use low-swing signaling to reduce the effects of noise.

– ASICs don’t use level-sensitive latches because there is a larger window for hold time violations. ASICs can not use multi-phase clocking schemes with time borrowing in skew tolerant domino logic for a variety of tool-flow related reasons.

Page 15: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 15

Timing Overhead: Clock Tree Design and Registers (2)

What can be done?– Current EDA tools can verify that hold times are not violated and insert delay

elements to avoid races on short paths.– Slack passing is possible with level-sensitive latches or cycle stealing, by carefully

scheduling of the arrival of the clock edges at different registers.– Multi-phase clocking is not a viable solution in the deep submicron, because of

signal integrity issues and the increasing difficulty of distributing several clocks across the chip.

– As ASICs have large clock skew, latches have substantial benefits for reducing the clock period.

– Level sensitive latches reduce the impact of inaccuracy of wire load models and process variation. The clock period is not limited by the delay of the slowest pipelining stage, because of slack passing.

– The clock skew can be reduced by using better clock tree synthesis tools or manually.

– Latches and typical ASIC clocking techniques are slower than custom techniques leading still to 1.1x faster custom design.

Page 16: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 16

Timing Overhead: Clock Tree Design and Registers (3)

In the book– Clock related timing issues are tutorially reviewed in chapter 3.– Chapter 7 describes a prototype tool that automatically converts a gate

netlist with flip-flops to use latches leading to 10-20% speed improvements. It also discusses the timing overhead in the Xtensa mircoprocessor.

– Chapter 8 considers issues associated with clock-tree synthesis and use of carefully adjusted clock skew in clock-tree design for cycle stealing.

– Chapter 15 discusses a very high speed ASIC design, where reduction of the timing overhead was essential. Clock trees were manually routed and both latches and high speed pulsed flip-flops were used.

Page 17: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 17

Logic Style (1)

Dynamic logic can be used to speed up critical paths within the circuitry by reducing gate delays.

– It significantly faster than static CMOS logic and has smaller area, but is much more sensitive to noise and consumes more power.

What’s the problem?– To be useful in an ASIC design methodology a logic style must be robust in

a variety of circuit conditions, and supported by tools for static timing analysis and manufacturing testing.

– Dynamic logic requires careful design of the power and clock distribution.– For these reasons dynamic logic libraries are not available it is not supported

for ASIC designs.

Page 18: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 18

Logic Style (2)

What can be done?– There has been some progress in dynamic logic circuit synthesis, but it is

not yet commercially available and it is not likely that the methodological obstacles will be overcome to enable dynamic logic synthesis in ASIC designs.

– However, the gap between static CMOS and dynamic logic can be reduced by using custom designed static logic with pulsed inputs. In this case the dynamic logic may only be 1.2x faster than highly optimized static logic.

In the book– Chapter 4 quantifies the performance improvement by using dynamic logic

and includes an example of a high speed 64 bit adder using static logic. Also the limitations on future use of domino logic is examined and some alternative high speed logic styles are discussed.

Page 19: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 19

Logic Design (1)

Logic design describes the topology or interconnectivity of the gates.

– It determines the choice of adder algorithm (e.g. ripple carry vs. carry look-ahead). Instead, the choice to pipeline the multiplier is a microarchitectural decision.

What’s the problem?– For random logic both ASIC and custom designers are likely to use logic

synthesis but ASIC designers are not typically as aware of logical design alternatives as custom designers.

– For example, Wallace tree multipliers have a triangular layout – if not carefully constrained, layout tools will try and fit gates to a rectangular region, which is sub-optimal and leads to longer wire lengths.

Page 20: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 20

Logic Design (2)

What can be done?– For datapath designs use the existing pre-designed libraries (e.g.

DesignWare) to get the optimized implementation.– ASIC logic design can come to parity with custom logic design.

In the book– Chapter 4 considers logic design with an example of a 64 bit adder and

looks for relative performance impact of good logic design.

Page 21: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 21

Cell Design and Wire Sizing (1)

In ideal circuit, each gate is optimally crafted from transistors and each transistor and wire is individually sized to meet the drive requirements.

What’s the problem?– One element of the performance degradation of ASIC design is the poverty

of standard cell libraries including limited number of discretely-sized cells.– Many ASICs still not use good standard cell libraries with varied drive

strengths.– A richer library also reduces circuit area.– Custom designs can achieve a factor of 1.4x speed improvement over poor

ASIC design.

Page 22: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 22

Cell Design and Wire Sizing (2)

What can be done?– ASIC designs should use rich standard cell libraries with dual gate polarities and

several drive strengths for each gate.– Several tools are available for automating the creation of cells that are optimized for

a design.– Tools of optimizing the widths of individual wires are currently not commercially

available.– By using continuous cell sizing and by using wire sizing custom designs can achieve

speeds about 1.1x faster than the best ASIC designs. In this book

– Chapter 4 gives a tutorial overview of issues in libraries.– Chapter 9 looks at commercial prototype tool for providing continuous cell sizing for

an ASIC flow.– Chapter 10 considers automatically finding macro-cells in logic.– Chapter 12 introduces a cell-sizing tool for custom designs.

Page 23: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 23

Layout: Floorplanning and Placement to Manage Wires (1)

Wire delays associated with global wires between physical modules are dominant in the sub-micron technologies.

– The primary factor in the wire delay is wire length.– Routing congestion and the position of cells in the layout affect the wire length.– Noise and cross-coupling capacitance between wires must also be considered.

What’s the problem?– Traditionally the load of gate drives is estimated using “wire-load models” that

estimates the capacitive load as function of block size and fanout.– This is a poor and inaccurate method in wire dominating delays.– Both over- and underestimating the wire loads leads to poor timing.– Carefully partitioned design can increase the speed by factor 1.4x compared to an

ASIC design that has large blocks of gates and uses inaccurate wire load models in synthesis.

Page 24: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 24

Layout: Floorplanning and Placement to Manage Wires (2)

What can be done?– Accurate wire-load models, good design partitioning, careful floorplanning

and resynthesis should improve the speed.– Custom ICs are typically manually floorplanned.– Physical synthesis (e.g. Physical Compiler) is one tool improve the

placement.– ASIC designs should be able to achieve parity with custom designs with

respect to floorplanning and detailed placement. In the book

– Chapter 4 discusses the impact of wire-load models, partitioning and layout tools.

– Chapter 6 describes floorplanning techniques for ASICs.– Chapter 11 gives an example where manual layout improves the

performance.

Page 25: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 25

Process Variation and Improvement (1)

Traditionally, the semiconductor process is represented as being fully determined by a given technology generation, identical implementations, different plants and at different times.

– The semiconductor process is also described through a set of worst-case numbers that abstract the complexity of the actual manufacturing.

– ASICs fabricated on a typical process can be 60-70% faster than the worst case speeds quoted by ASIC library estimates.

In the same nominal process technology, the speed varies significantly for the same ASIC design synthesized to different libraries and different plant processes.

For custom ICs, additional improvements to the process or the design are possible.

The semiconductor process can not be perfectly controlled, which leads to statistical variation of many process variables.

– Batch-to-batch, wafer-to-wafer, die-to-die and intra-die– This variation decreases as the process matures.

Page 26: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 26

Process Variation and Improvement (2)

What’s the problem?– The ASIC design and manufacturing industry is organized around a

“handoff” point. Golden toolset and libraries guarantee for the customer the product speed.

– This traditional approach to worst-case timing analysis leaves significant performance potential of semiconductor processes unharvested.

– Sticking to worst-case process numbers and lacking the ability to exploit the process improvements will lead to 1.6x – 2.2x speed difference between ASICs and custom ICs.

What can be done?– As a result of relying on pre-characterized cell libraries, ASICs are typically

easy to migrate between technology generations. Thus synthesizable ASICs can be easily switched to use the best fabrication plants available for ASIC production. The custom ICs are more technology dependent.

Page 27: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 27

Process Variation and Improvement (3)

What else can be done?– New ASIC libraries should be available from the ASIC vendor when

process improvements are done.– At-speed testing to harvest the worst-case ASICs are not usually available

from the ASIC vendors (commercial reasons).– Due to process variation ASICs will lose a factor of 1.2x performance to

custom designs. In the book

– Chapter 5 gives a tutorial overview of process related issues.– Chapter 13 describes a very practical way to manage the impact of process

variation.– Chapter 14 gives more speculative approach to exploiting processing.

Page 28: TKT-9627 Digital and Computer Systems Seminar

JET 2006 TKT-9627 Digital and Computer Systems Seminar 28

Summary and Conclusions

There is a significant (3x to 8x) performance difference between ASICs and custom ICs.

– Influence of the factors of floorplanning and circuit design, while significant, are relatively overstated in their importance.

– The two factors of equal or greater significance are pipelining and process variation.

– The use of dynamic-logic families is a third significant influence. ASIC designers must become familiar with microarchitecture,

physical design, clocking schemes, and sources of semiconductor process variation.