ola conf 2002 - ola in soc design environment - paper

- 1 -

Benefits of OLA Integration

into

Nano-Technology SoC Design Environments

2002 First Annual OLA Developer’s Conference

February 11-12, 2002

San Jose, California

Timothy J. Ehrler

Senior Principal Methodology Engineer

SoC Methodology Development, Design Technology Group

Philips Semiconductors

8372 S. River Parkway, Tempe AZ 85284

[email protected]

Abstract

As technologies progress to sub-100nm level,

increased chip densities are allowing greater

functionality to be combined onto a single die.

Increasingly complex designs are evolving from what

had previously been sets of ASIC chips into a highly

integrated system on a chip (SoC). This added

complexity is reflected not only in that of the design

itself, but also in the demands placed upon the EDA

tools and methodologies necessary to implement such

designs.

Critical to the SoC design cycle is the convergence to

sufficiently accurate timing and power. Most EDA

methodologies rely on tool-specific, proprietary

characterization data views, or that of a “de-facto”

standard format. Calculation algorithms differ, as do

critical signal integrity (SI) analysis capabilities, with

designers encountering inconsistent, divergent

results, often among different tools from the same

vendor. The necessary exchange of large volumes of

timing information among tools, with the associated

storage and export/import time costs, further impacts

design cycle times. Multiple passes through design

processes magnify these impacts.

The integration of Open Library Architecture (OLA)

libraries within nano-technology design

environments can positively impact SoC design cycle

times. Consistent calculation of desired information

across a standard application programming interface

(API) ensures analysis convergence among tools,

eliminates data exchange processing and storage

requirements, and significantly reduces iterations

through design processes steps.

1. Technology Advancements

Semiconductor technology has been advancing at

least as rapidly as the rate predicted by Moore’s Law.

As transistor sizes have decreased, so too have

associated cell sizes, with increased device operating

frequencies and decreased cell delays which are

becoming more susceptible to IR drop and more

dependent upon output loading and input slew rates

than with previous technologies. At the same time,

timing has become increasingly affected by

interconnect related issues such as cross-coupling,

wire inductance, and signal noise.

As technology progresses to, and even exceeds, the

sub-100nm, or “nano-technology”, level, the

capability exists to implement a complete functional

system on a single chip. Whereas previous

technologies had necessitated the implementation of

a total system “solution” to be distributed across a

number of advanced ASIC chips, current

technologies now allow, and indeed encourage, the

complete implementation within a single “system-on-

chip” (SoC).

2. Design Flow Complexity

In order to realize the implementation of such

expansive designs, however, a new paradigm has

emerged which focuses on integrating previously

developed and validated complex blocks of logic

and/or intellectual property (IP), cores, and

memories. The high levels of integration associated

with this paradigm is dramatically increasing the

interconnect to cell delay ratio, requiring more

accurate timing calculation methodologies based

upon the emerging deep sub-micron (DSM)

interconnect issues.

3. Technology & Design Information

In order to address these technology and design

issues, many more tools are being injected into

traditional design flows, most of which analyze,

generate, or depend upon, concise timing and/or

power information to arrive at optimal design

solutions. Worse still, much of this information is

exchanged among tools by formatting and exporting

to mass storage from one tool, followed by importing

- 2 -

from storage, parsing, and interpreting that data

within another tool.

Although the format or content of traditional

representations of the characterized information

required by a particular tool may be well defined, the

interpretation of that data, calculation algorithms

involved, and accuracy of such calculations may

differ significantly among tools. The resulting

inconsistent, and oftentimes correspondingly

inaccurate, timing information substantially

contributes to increased design cycles which rely on

consistent and accurate timing to accomplish solution

design objectives [6].

4. Traditional Design Flow

In order to illustrate the major issues facing timing

closure driven design flows, we’ll first review a

typical design flow using traditional library formats.

This flow, restricted to only a relevant subset for this

discussion, is illustrated in Figure 1. The simplistic

assumption herein is that the user’s design flow may

encompass a variety of tools from multiple tool

vendors, including the foundry-provided delay

calculator required for sign-off. This also implies,

perhaps in the extreme, that each tool, or type

thereof, requires its own library, the format of which

may be industry standard, “de-facto” standard, or

proprietary, and may not be common to other tools

within the flow.

Of particular note within the timing sections of the

design flow, shown within the shaded areas, is the

inclusion of a foundry or semiconductor vendor

supplied delay calculation tool. This tool generates a

timing back-annotation SDF file using (perhaps)

proprietary timing calculation algorithms specific to

the supported technology. In addition to providing

delay and constraint timing, it may also provide a

slewrate, or ramp times, report as well. Scripts or

other tools may process this report, or it may be

directly imported by the static timing analysis and/or

synthesis and optimization tools, using such

information as constraints for further analysis. This

becomes much more critical to timing closure within

later physical design phases since design

performance becomes increasingly impacted by slight

changes to the design itself, where slewrates may

become more consequential than delay times.

Although functional simulation and formal

verification process steps are included within the

illustrated flow, they are not relevant to the initial

timing closure discussions, but their presence within

the flow will be touched on when discussing an OLA

based design flow.

4.1 Pre-Route Timing Closure

Preliminary timing closure is usually performed after

the initial RTL-to-gates synthesis process in order to

arrive at a sufficiently practical implementation of the

design solution within given performance

specifications. This phase may also require closure

for gross power consumption, which may or may not

be arrived at using additional analysis tools.

Interconnect timing is estimated using the technology

library’s wireload tables, which can be detrimental to

the closure cycle since such models are statistical by

DDee

ssii gg

nn DD

aatt aa

bbaa

ssee

SDF

Static Timing

Analysis

Slew Report

Delay Calculation (Parasitics)

LIB

LIB

Netlist

SPEF Parasitics Extraction

Place & Route Clock Tree Pad Ring

SDF

Static Timing

Analysis

Slew Report

Delay Calculation

(Custom WL)

LIB

LIB

Netlist

Custom Wireloads

Wireload Extraction

Floor Planning

Netlist SDF

Static Timing

Analysis

Slew Report

Delay Calculation (Tech. WL)

LIB

LIB

Synthesis Optimization

Scan Insertion

LIB RTL

Formal Verification

LIB

Functional Simulation LIB

Formal Verification LIB

Import Library

Figure 1. Traditional Design Flow

- 3 -

nature, and can not reflect the varying interconnect

characteristics among IP, cores, and random logic.

Although the iterations through this process of timing

calculation, static timing analysis, and logic

optimization may not be as numerous as when

performed in later physical design phases, an

especially high performance design may require a

significant number of iterations when implemented

with low-speed/low-power, i.e. low-performance,

technology libraries. The greater the disparity

between the design performance objectives and the

performance of the implementation technology, the

more iterations which must occur in order to achieve

initial closure. As shown, however, the cost of each

iteration cycle is the generation and back-annotation

of SDF and slewrate information files, along with the

associated processing, I/O, and storage resource

costs.

If there are any discrepancies between the timing

view from which the SDF file has been generated and

that of the consuming tool, considerable efforts are

required to modify the SDF to conform to those

views demanded by the latter. Given the significant

size and content of this timing information file for

SoC designs, conversion tool limits may well be

exceeded by the complexity of the task.

4.2 Floor Plan Timing Closure

Secondary timing closure may be performed after

initial floor planning but prior to final placement and

routing of the design. At this point in the flow,

custom wireload models may be derived from the

floor plan in order to make a more meaningful

estimate of interconnect timing. Iterations through

this phase can assist in reaching gross placement

timing, but can be very deceptive since the derived

custom wireloads are still statistical, although

targeted at this particular implementation only, yet

still can not accurately account for the varying types

of interconnect among the blocks and gates.

In addition to the costly overhead of SDF processing,

there are also the costs, though not nearly as severe,

of processing the custom wireloads. Design changes

resulting from the timing analysis warrant

corresponding changes to the design database. This,

in turn, requires the extraction and generation of a

netlist file for those tools not having direct access to

the database, with the associated time and storage

costs.

4.3 Post-Route Timing Closure

The most critical phase of design implementation is

the final timing closure after placement, routing,

clock tree synthesis, and I/O pad ring processing

steps have been completed. At this point in the design

cycle, the design has been completely implemented at

the physical level, and all information required to

achieve power and timing closure is available to the

respective tools.

Of particular relevance to this discussion is the

timing closure iteration cycle. Included within this

process is the major overhead of parasitics extraction,

with the associated I/O, storage, and processing costs,

all of which can be tremendous. At this stage,

extracting the parasitics and generating the SPEF file

can take 10’s of hours of processing time and multi-

Gbytes of storage space. Conversely, importing that

information can take even longer since the contents

must be parsed and processed in a manner dictated by

the consuming tool, and may require a

correspondingly large memory requirement to do so.

In addition, SDF generation can take many hours and

consume many 100’s of Mbytes of storage, with the

same impact of importing and processing that

information by its consuming tool.

5. Timing Closure Impediments

Because of the methodologies employed within

traditional design flows, the deficiencies that can be

attributed to the representation, organization,

exchange, and processing of characterization and

design information, the efforts involved in achieving

timing closure with large SoC designs can be

immense, requiring significant resource commitments

in terms of compute facilities, mass storage,

personnel, and design time. The major issues

contributing to ineffective timing closure include

timing calculation methods, interconnect analysis,

view consistency, and information exchange. The

limitations and restrictions caused by these issues

result in additional iterations within stages of the

design cycle, oscillating around design performance

targets as the designer attempts to converge on

sufficiently accurate timing.

5.1 Timing Calculation Methods

Each tool within a design flow will usually contain its

own timing engine, based upon the supporting library

views containing pertinent characterization data, the

algorithms of which are sufficiently different enough

that the timing obtained from one tool may be

inconsistent with that of another, and may be

performed with varying levels of accuracy among

them. Methods and calculations regarding the

derating and/or scaling of this timing will differ, as

will the capability to perform instance-specific versus

global PVT point processing to account for IR drop

and thermal effects.

- 4 -

5.2 Interconnect Analysis

In addition to differing timing calculation methods,

each tool may have its own interconnect analysis

algorithms as well. Different methods of network

reduction may be employed, loads may be calculated

as lumped or effective, and network driving

waveforms and subsequent propagation throughout

may or may not be implemented or supported, and

most probably differ among tools.

Signal integrity issues, such as cross-coupling effects

and noise-propagation, may or may not be

implemented, or may be implemented sufficiently

differently as to appear conflicting among tools.

5.3 Timing View Consistency

Tools and library views are inherently coupled,

resulting in inconsistent, and oftentimes conflicting,

timing representation among the many library views

consumed within the design flow, depending on the

capabilities and purposes of the tools involved.

Timing may be conditional within one library,

unconditional in another, and omitted entirely in

another. Complementary constraints may be

described differently, such as a SETUPHOLD

window in one and separate SETUP and HOLD

timing in another. Interpretation and support for

timing constructs may differ, such as REMOVAL

being treated as HOLD or ignored altogether.

5.4 Timing Information Exchange

With SDF files being used as the most common

method of exchanging timing information among

tools, insufficient, inconsistent, and inaccurate

information is presented to the consuming tools. One

of the most consequential deficiencies of the format

is the absence of available slewrate information,

which becomes more critical to analysis and design

tools for DSM SoC designs. Lacking this

information, a tool may derive inaccurate ramp times,

default to an incorrect value, or simply assume a 0.0

value, all of which will severely affect tools that rely

on skew information for critical paths or structures,

such as clock trees.

There can be significant differences in the timing

view defined within the library from which the SDF

is generated and that of the consuming tool, resulting

in unsuccessful back-annotation or, even worse,

default or erroneous timing. An SDF generation tool

may merge interconnect timing with path timing

rather than being separately specified, preventing

consuming tools from properly performing their

function. Calculated negative timing may or may not

be generated in the SDF, and consuming tools may or

may not accept it. Support of multiple versions of the

SDF format, 2.1 and 3.0, may require both flavors be

generated to satisfy the consuming tool requirements,

which can convey inconsistent information to the

various tools.

Some tools may support specific constructs, such as

REMOVAL timing, but others may translate them to

another, such as HOLD constraints, while still others

may ignore them completely. Although an SDF can

represent a triplet of timing as well as a single timing

point, SDF generators may well only support one,

while consuming tools may only support the other. If

a triplet representation is required, but the generation

is at a single point, multiple generations must be

performed to obtain the corresponding points and the

results merged into a single SDF file, with the

associated I/O, processing, and storage overhead

costs it imposes.

Aside from the above issues, significant overhead

costs of exchanging timing information among tools

in this manner are involved. The generation of the

information requires formatting, I/O processing, and

storage resources, while the consumer requires I/O,

parsing, and processing resources. With large SoC

designs, the can well take tens of hours and hundreds

of Mbytes of storage with each tool.

6. Open Library Architecture (OLA)

The creation of the Delay Calculation Language

(DCL) based Delay Calculation System (DCS) by

IBM introduced the concept of embedding timing

calculation algorithms within a technology library.

The application would “converse” with the library

through a standard set of application programming

interfaces (API) to request particular timing

information rather than accessing and interpreting

raw timing information from a library and then

calculating the desired result. Later enhancements to

include power calculation capabilities resulted in the

IEEE 1481-1999 standard for Delay and Power

Calculation System (DPCS) [1].

Subsequent extensions to the system to include

graph-based functional descriptions, vector based

timing and power arithmetic models, and cell and pin

properties and attributes from Accellera’s Advanced

Library Format (ALF) standard [2][3] further

expanded its capabilities. This resulting SI2 Open

Library Architecture (OLA) standard was further

improved upon to include more concise APIs for

interconnect parasitics, with later additional APIs

developed to address signal integrity issues such as

cross-coupling, noise propagation, parasitic analysis,

and physical characteristics for floor planning,

placement, routing, etc. [4][5].

- 5 -

6.1 OLA Concept

The purpose of OLA is to provide a single method by

which information required by an application is

consistently and accurately calculated. It replaces the

traditional method of parsing and interpreting

characterization information from varying view

formats and calculating the desired results using

application-specific algorithms with a compiled

library from which the desired information can be

programmatically requested, calculated, then returned

as shown in Figure 2.

The concept of OLA, and of DPCS in general, is that

an application dynamically links the OLA library at

runtime, and “converses” with the library through a

standard defined set of a programming interfaces

(API) to obtain such information as is needed by that

application. The application initiates the request for

information, and the library responds to the requests,

returning the requested information. It does so by

using the information provided through the API,

using internally cached information, using library

characterization information, and/or requesting

additional information from the application, then

calculating the requested information and returning it

to the application. At any time during this

“conversation”, additional information may be

requested until all required information has been

collected, the results calculated, and then returned to

the requestor.

A very simplistic example of this interaction can be

shown for an application, such as a static timing

analysis tool, requiring timing within a flip-flop cell

‘dff’ from the rising clock ‘ck’ to falling output ‘q’.

APP: request timing from OLA

OLA: get passed timing path

OLA: request PVT from APP

APP: return PVT

OLA: get passed ‘ck’ slew

OLA: request ‘q’ load from APP

APP: return ‘q’ load

OLA: calculate timing

(early/late delay/slew)

OLA: return timing

APP: use requested timing information

6.2 OLA Benefits

By embedding the algorithmic calculations within the

library itself, consistent results are always obtained

for use by the requesting application. Slew/rate

information is calculated in conjunction with delay

timing, providing those tools additional information

not otherwise directly available through SDF timing

information exchange. Since network reduction,

parasitic analysis, cross-talk, and noise propagation

methodologies are embedded as well, interconnect

timing calculations between cells is as consistent as

that within cells. Providing this consistent and

additional information, as well as eliminating

annotation failures due to timing view inconsistencies

and conflicting/ambiguous annotation information

interpretation, significantly reduces timing closure

iteration cycles.

The generation of multiple SDF files at different PVT

points, and overhead costs of merging them into an

acceptable form for consuming tools, is eliminated

since instance-specific timing calculations are

supported. Incremental timing is easily performed on

demand, again eliminating the requirement for SDF

generation and annotation to account for incremental

design changes.

Because timing information and algorithms are

compiled into the library instead being made

available in a readable format, intellectual property

content can be hidden from the user. This protects the

vendor’s IP, allows for the implementation of internal

timing within the IP, and also prevents local

“hacking” of library information by users.

In addition to providing a consistent calculation

methodology, functional expressions, such as

specified for conditional timing and functional

behavior, is available in a graph-based form. This

removes from each application the requirement to

parse and interpret expressions, again eliminating

inconsistent interpretation of library information

among tools. It also provides consistent functional

information such that synthesis, formal verification,

Figure 3. Timing Example

d q

ck

dff

Tool

OLA LIB

DDPP

CCSS

Figure 2. OLA Concept

Tool

Tool

- 6 -

and simulation tools can use the library as well,

eliminating even more views from the design flow.

6.3 Design Flow Usage

The most productive usage of OLA libraries within

the design flow involves those stages relating to

timing closure. By replacing the separate typical

static timing analysis sub-flows involving the

foundry-supplied delay calculator (Figure 4) with one

interfacing with the OLA library (Figure 5), a more

concise, consistent, and accurate timing analysis can

be performed. This eliminates the need for a stand-

alone delay calculator since the timing algorithms

contained therein are now included within the library

itself, and provides slew as well as timing

information to be provided to the analysis tool.

Notably missing from this sub-flow is the SDF file,

the usage of which for timing back-annotation is no

longer required. The reduction in the number of

required library view formats to that of OLA only,

eliminating perhaps inconsistent and inaccurate

timing views from the analysis sub-flow, promotes

faster timing convergence as well.

The combination of compatible timing views,

consistent timing calculations, and the elimination of

incomplete timing information exchange through the

intermediate SDF file, greatly reduces the number of

iterations required to converge upon a timing

solution.

7. OLA Based Design Flow

An equivalent design flow which integrates OLA

libraries therein, replacing the stand-alone foundry-

supplied delay calculator and SDF back-annotation

file, is shown in Figure 6. The relative simplicity of

this flow with respect to the previous typical one is

immediately apparent by the simplified timing

closure stages, as well as the notable reduction in the

number of required library views for the various tools

included within the flow.

7.1 Timing Closure Improvements

The consistent and accurate timing calculation

algorithms embedded within the OLA library allow

faster convergence to a reliable timing solution. This

capability is available for pre-route, floor plan, and

post-route stages of timing closure, all using

consistent timing calculation methods and algorithms

embedded therein.

Iterations within the timing closure stages of the

design flow are significantly reduced primarily due to

this combination of accurate and consistent timing

calculation. The single timing engine within the

library itself provides consistent information to the

application, complete with slew times, for both early

and late timing. Tool-specific algorithms are avoided,

as are the commonplace incompatibilities among

differing timing views usually present within the

associated libraries. Back-annotation of [in]complete

timing information using SDF files is also avoided

since such information need not be exchanged among

SDF

Static Timing

Analysis

Slew Report

Delay Calculation LIB

LIB

Figure 4. Typical Timing Analysis

OLA LIB DD

PPCC

SS

Static Timing

Analysis

Figure 5. OLA Timing Analysis

DDee

ssii gg

nn DD

aatt aa

bbaa

ssee

Netlist

SPEF Parasitics Extraction

Place & Route Clock Tree Pad Ring

Custom Wireloads

Wireload Extraction

Floor Planning

Netlist

Synthesis Optimization

Scan Insertion

RTL

Formal Verification

Functional Simulation

OLA LIB

Formal Verification

DDee

ll aayy aa

nndd

PPoo

wwee

rr CCaa

ll ccuu

ll aatt ii oo

nn SS

yysstt ee

mm (( OO

LLAA

))

Static Timing

Analysis

Netlist Static Timing

Analysis

Static Timing

Analysis

Figure 6. OLA Based Design Flow

- 7 -

tools, but rather are calculated and provided as

needed by the application.

Instance specific PVT-related timing is provided for

consideration of IR drop and thermal affects, as is the

capability to provide incremental timing as opposed

to requiring generation and exchange of complete

block or design timing information.

Interconnect timing calculations, with the associated

parasitics network reduction and waveform

propagation algorithms, are part of the library timing

engine, and provides consistent results to all

applications. Signal integrity issues such as cross-talk

can be implemented therein, as can be the inclusion

of inductance for RLC rather than RC based timing.

7.2 Extended Integration

In addition to the elimination of the delay calculation

tool and SDF file, note the further elimination of

many of the tool-specific library views. Since OLA

libraries provide information and associated

algorithms in a standard accessible method, and

provide for more than just timing and power analysis

tools, OLA-compliant tools other than those intended

strictly for static timing analysis can be integrated

into the design flow as well, further reducing the

need for the various formats of tool-specific views

previously required. Such tools include synthesis,

scan insertion, optimization, functional simulation,

formal verification, and many others.

An extremely aggressive integration of OLA-

compliant tools and libraries, utilized wherever

possible within a complete industry design flow [7],

can dramatically reduce the number of required

library views, as shown in Figure 7, yielding

corresponding improvements within the design flow.

8. Conclusion

The capability of providing consistent and accurate

timing information at all levels of the design process,

from pre-route through post-route, can dramatically

reduce, if not eliminate, iterations within timing

closure stages, converging on a design solution which

meets performance objectives much faster and more

easily than with traditional approaches.

Interconnect analysis, with due consideration of

signal integrity issues, can be calculated in a

consistently accurate manner, allowing faster timing

convergence once the physical implementation of a

design is realized. In addition, instance-specific PVT-

based timing provides for increased accuracy where

IR drop and thermal effects may manifest

themselves, and the capability to provide incremental

timing on demand eliminates the need for further

iteration cycles.

Above all, the elimination of SDF file based timing

information exchange requirements among tools,

with the incurred compatibility, resource, and time

costs, greatly improves design development

productivity.

In conclusion, the integration of OLA libraries within

a design flow, in conjunction with appropriate OLA-

compliant tools, can significantly improve design

efforts by reducing timing closure time through the

use of more accurate and consistent timing

calculation methods, which directly contributes to

reduced design cycle time.

References [1] Design Automation Standards Committee of the

IEEE Computer Society, “IEEE Standard for

Integrated Circuit (IC) Delay and Power

Calculation System”, IEEE 1481-1999, 26 June

1999.

[2] Accellera, “Advanced Library Format (ALF) for

ASIC Technology, Cells, and Blocks”, revision

2.0, 14 December 2000.

[3] IEEE P1603, “A standard for an Advanced

Library Format (ALF) describing Integrated

Circuit (IC) technology, cells, and blocks”,

revision draft 2, 12 November 2001.

[4] Silicon Integration Initiative, “Specification for

the Open Library Architecture (OLA)”, revision

1.7.04, 3 January 2002.

[5] J. Abraham, S. Churiwala, “Flexible Model for

Delay and Power”, Silicon Integration Initiative,

1998.

[6] T. Tessier, C. Buhlman, “Timing Closure of a

870Kgate + 3 Mbit Ram, 0.2u-12mm Die in a

1312 Pin Package IC”, SNUG 2001.

[7] T. Ehrler, “Multiple Design Flows: Reducing

Support Requirements with OLA”, Custom

Integrated Circuits Conference 2001, ALF/OLA

Panel Discussion, 6-9 May 2001.

Design Process Tools Standard /Proprietary

Formats

TotalFormats

OLAReplaceable/ Deleteable

TotalFormats

FormatReduction

RTL Development/Analysis 5 3/0 3 2/0 2 33%Design Synthesis 7 4/6 10 4/1 6 40%

Logic/Timing Verification 17 5/11 16 6/5 6 63%Partitioning & Floor Planning 11 3/9 12 5/0 8 33%

Layout & Chip Finishing 21 4/15 19 6/2 12 37%

Figure 7. Library View Requirement Reduction