ola conf 2002 - ola in soc design environment - paper
DESCRIPTION
The integration of Open Library Architecture (OLA) libraries within nano-technology design environments can positively impact SoC design cycle times. Consistent calculation of desired information across a standard application programming interface (API) ensures analysis convergence among tools, eliminates data exchange processing and storage requirements, and significantly reduces iterations through design processes steps.TRANSCRIPT
- 1 -
Benefits of OLA Integration
into
Nano-Technology SoC Design Environments
2002 First Annual OLA Developer’s Conference
February 11-12, 2002
San Jose, California
Timothy J. Ehrler
Senior Principal Methodology Engineer
SoC Methodology Development, Design Technology Group
Philips Semiconductors
8372 S. River Parkway, Tempe AZ 85284
Abstract
As technologies progress to sub-100nm level,
increased chip densities are allowing greater
functionality to be combined onto a single die.
Increasingly complex designs are evolving from what
had previously been sets of ASIC chips into a highly
integrated system on a chip (SoC). This added
complexity is reflected not only in that of the design
itself, but also in the demands placed upon the EDA
tools and methodologies necessary to implement such
designs.
Critical to the SoC design cycle is the convergence to
sufficiently accurate timing and power. Most EDA
methodologies rely on tool-specific, proprietary
characterization data views, or that of a “de-facto”
standard format. Calculation algorithms differ, as do
critical signal integrity (SI) analysis capabilities, with
designers encountering inconsistent, divergent
results, often among different tools from the same
vendor. The necessary exchange of large volumes of
timing information among tools, with the associated
storage and export/import time costs, further impacts
design cycle times. Multiple passes through design
processes magnify these impacts.
The integration of Open Library Architecture (OLA)
libraries within nano-technology design
environments can positively impact SoC design cycle
times. Consistent calculation of desired information
across a standard application programming interface
(API) ensures analysis convergence among tools,
eliminates data exchange processing and storage
requirements, and significantly reduces iterations
through design processes steps.
1. Technology Advancements
Semiconductor technology has been advancing at
least as rapidly as the rate predicted by Moore’s Law.
As transistor sizes have decreased, so too have
associated cell sizes, with increased device operating
frequencies and decreased cell delays which are
becoming more susceptible to IR drop and more
dependent upon output loading and input slew rates
than with previous technologies. At the same time,
timing has become increasingly affected by
interconnect related issues such as cross-coupling,
wire inductance, and signal noise.
As technology progresses to, and even exceeds, the
sub-100nm, or “nano-technology”, level, the
capability exists to implement a complete functional
system on a single chip. Whereas previous
technologies had necessitated the implementation of
a total system “solution” to be distributed across a
number of advanced ASIC chips, current
technologies now allow, and indeed encourage, the
complete implementation within a single “system-on-
chip” (SoC).
2. Design Flow Complexity
In order to realize the implementation of such
expansive designs, however, a new paradigm has
emerged which focuses on integrating previously
developed and validated complex blocks of logic
and/or intellectual property (IP), cores, and
memories. The high levels of integration associated
with this paradigm is dramatically increasing the
interconnect to cell delay ratio, requiring more
accurate timing calculation methodologies based
upon the emerging deep sub-micron (DSM)
interconnect issues.
3. Technology & Design Information
In order to address these technology and design
issues, many more tools are being injected into
traditional design flows, most of which analyze,
generate, or depend upon, concise timing and/or
power information to arrive at optimal design
solutions. Worse still, much of this information is
exchanged among tools by formatting and exporting
to mass storage from one tool, followed by importing
- 2 -
from storage, parsing, and interpreting that data
within another tool.
Although the format or content of traditional
representations of the characterized information
required by a particular tool may be well defined, the
interpretation of that data, calculation algorithms
involved, and accuracy of such calculations may
differ significantly among tools. The resulting
inconsistent, and oftentimes correspondingly
inaccurate, timing information substantially
contributes to increased design cycles which rely on
consistent and accurate timing to accomplish solution
design objectives [6].
4. Traditional Design Flow
In order to illustrate the major issues facing timing
closure driven design flows, we’ll first review a
typical design flow using traditional library formats.
This flow, restricted to only a relevant subset for this
discussion, is illustrated in Figure 1. The simplistic
assumption herein is that the user’s design flow may
encompass a variety of tools from multiple tool
vendors, including the foundry-provided delay
calculator required for sign-off. This also implies,
perhaps in the extreme, that each tool, or type
thereof, requires its own library, the format of which
may be industry standard, “de-facto” standard, or
proprietary, and may not be common to other tools
within the flow.
Of particular note within the timing sections of the
design flow, shown within the shaded areas, is the
inclusion of a foundry or semiconductor vendor
supplied delay calculation tool. This tool generates a
timing back-annotation SDF file using (perhaps)
proprietary timing calculation algorithms specific to
the supported technology. In addition to providing
delay and constraint timing, it may also provide a
slewrate, or ramp times, report as well. Scripts or
other tools may process this report, or it may be
directly imported by the static timing analysis and/or
synthesis and optimization tools, using such
information as constraints for further analysis. This
becomes much more critical to timing closure within
later physical design phases since design
performance becomes increasingly impacted by slight
changes to the design itself, where slewrates may
become more consequential than delay times.
Although functional simulation and formal
verification process steps are included within the
illustrated flow, they are not relevant to the initial
timing closure discussions, but their presence within
the flow will be touched on when discussing an OLA
based design flow.
4.1 Pre-Route Timing Closure
Preliminary timing closure is usually performed after
the initial RTL-to-gates synthesis process in order to
arrive at a sufficiently practical implementation of the
design solution within given performance
specifications. This phase may also require closure
for gross power consumption, which may or may not
be arrived at using additional analysis tools.
Interconnect timing is estimated using the technology
library’s wireload tables, which can be detrimental to
the closure cycle since such models are statistical by
DDee
ssii gg
nn DD
aatt aa
bbaa
ssee
SDF
Static Timing
Analysis
Slew Report
Delay Calculation (Parasitics)
LIB
LIB
Netlist
SPEF Parasitics Extraction
Place & Route Clock Tree Pad Ring
SDF
Static Timing
Analysis
Slew Report
Delay Calculation
(Custom WL)
LIB
LIB
Netlist
Custom Wireloads
Wireload Extraction
Floor Planning
Netlist SDF
Static Timing
Analysis
Slew Report
Delay Calculation (Tech. WL)
LIB
LIB
Synthesis Optimization
Scan Insertion
LIB RTL
Formal Verification
LIB
Functional Simulation LIB
Formal Verification LIB
Import Library
Figure 1. Traditional Design Flow
- 3 -
nature, and can not reflect the varying interconnect
characteristics among IP, cores, and random logic.
Although the iterations through this process of timing
calculation, static timing analysis, and logic
optimization may not be as numerous as when
performed in later physical design phases, an
especially high performance design may require a
significant number of iterations when implemented
with low-speed/low-power, i.e. low-performance,
technology libraries. The greater the disparity
between the design performance objectives and the
performance of the implementation technology, the
more iterations which must occur in order to achieve
initial closure. As shown, however, the cost of each
iteration cycle is the generation and back-annotation
of SDF and slewrate information files, along with the
associated processing, I/O, and storage resource
costs.
If there are any discrepancies between the timing
view from which the SDF file has been generated and
that of the consuming tool, considerable efforts are
required to modify the SDF to conform to those
views demanded by the latter. Given the significant
size and content of this timing information file for
SoC designs, conversion tool limits may well be
exceeded by the complexity of the task.
4.2 Floor Plan Timing Closure
Secondary timing closure may be performed after
initial floor planning but prior to final placement and
routing of the design. At this point in the flow,
custom wireload models may be derived from the
floor plan in order to make a more meaningful
estimate of interconnect timing. Iterations through
this phase can assist in reaching gross placement
timing, but can be very deceptive since the derived
custom wireloads are still statistical, although
targeted at this particular implementation only, yet
still can not accurately account for the varying types
of interconnect among the blocks and gates.
In addition to the costly overhead of SDF processing,
there are also the costs, though not nearly as severe,
of processing the custom wireloads. Design changes
resulting from the timing analysis warrant
corresponding changes to the design database. This,
in turn, requires the extraction and generation of a
netlist file for those tools not having direct access to
the database, with the associated time and storage
costs.
4.3 Post-Route Timing Closure
The most critical phase of design implementation is
the final timing closure after placement, routing,
clock tree synthesis, and I/O pad ring processing
steps have been completed. At this point in the design
cycle, the design has been completely implemented at
the physical level, and all information required to
achieve power and timing closure is available to the
respective tools.
Of particular relevance to this discussion is the
timing closure iteration cycle. Included within this
process is the major overhead of parasitics extraction,
with the associated I/O, storage, and processing costs,
all of which can be tremendous. At this stage,
extracting the parasitics and generating the SPEF file
can take 10’s of hours of processing time and multi-
Gbytes of storage space. Conversely, importing that
information can take even longer since the contents
must be parsed and processed in a manner dictated by
the consuming tool, and may require a
correspondingly large memory requirement to do so.
In addition, SDF generation can take many hours and
consume many 100’s of Mbytes of storage, with the
same impact of importing and processing that
information by its consuming tool.
5. Timing Closure Impediments
Because of the methodologies employed within
traditional design flows, the deficiencies that can be
attributed to the representation, organization,
exchange, and processing of characterization and
design information, the efforts involved in achieving
timing closure with large SoC designs can be
immense, requiring significant resource commitments
in terms of compute facilities, mass storage,
personnel, and design time. The major issues
contributing to ineffective timing closure include
timing calculation methods, interconnect analysis,
view consistency, and information exchange. The
limitations and restrictions caused by these issues
result in additional iterations within stages of the
design cycle, oscillating around design performance
targets as the designer attempts to converge on
sufficiently accurate timing.
5.1 Timing Calculation Methods
Each tool within a design flow will usually contain its
own timing engine, based upon the supporting library
views containing pertinent characterization data, the
algorithms of which are sufficiently different enough
that the timing obtained from one tool may be
inconsistent with that of another, and may be
performed with varying levels of accuracy among
them. Methods and calculations regarding the
derating and/or scaling of this timing will differ, as
will the capability to perform instance-specific versus
global PVT point processing to account for IR drop
and thermal effects.
- 4 -
5.2 Interconnect Analysis
In addition to differing timing calculation methods,
each tool may have its own interconnect analysis
algorithms as well. Different methods of network
reduction may be employed, loads may be calculated
as lumped or effective, and network driving
waveforms and subsequent propagation throughout
may or may not be implemented or supported, and
most probably differ among tools.
Signal integrity issues, such as cross-coupling effects
and noise-propagation, may or may not be
implemented, or may be implemented sufficiently
differently as to appear conflicting among tools.
5.3 Timing View Consistency
Tools and library views are inherently coupled,
resulting in inconsistent, and oftentimes conflicting,
timing representation among the many library views
consumed within the design flow, depending on the
capabilities and purposes of the tools involved.
Timing may be conditional within one library,
unconditional in another, and omitted entirely in
another. Complementary constraints may be
described differently, such as a SETUPHOLD
window in one and separate SETUP and HOLD
timing in another. Interpretation and support for
timing constructs may differ, such as REMOVAL
being treated as HOLD or ignored altogether.
5.4 Timing Information Exchange
With SDF files being used as the most common
method of exchanging timing information among
tools, insufficient, inconsistent, and inaccurate
information is presented to the consuming tools. One
of the most consequential deficiencies of the format
is the absence of available slewrate information,
which becomes more critical to analysis and design
tools for DSM SoC designs. Lacking this
information, a tool may derive inaccurate ramp times,
default to an incorrect value, or simply assume a 0.0
value, all of which will severely affect tools that rely
on skew information for critical paths or structures,
such as clock trees.
There can be significant differences in the timing
view defined within the library from which the SDF
is generated and that of the consuming tool, resulting
in unsuccessful back-annotation or, even worse,
default or erroneous timing. An SDF generation tool
may merge interconnect timing with path timing
rather than being separately specified, preventing
consuming tools from properly performing their
function. Calculated negative timing may or may not
be generated in the SDF, and consuming tools may or
may not accept it. Support of multiple versions of the
SDF format, 2.1 and 3.0, may require both flavors be
generated to satisfy the consuming tool requirements,
which can convey inconsistent information to the
various tools.
Some tools may support specific constructs, such as
REMOVAL timing, but others may translate them to
another, such as HOLD constraints, while still others
may ignore them completely. Although an SDF can
represent a triplet of timing as well as a single timing
point, SDF generators may well only support one,
while consuming tools may only support the other. If
a triplet representation is required, but the generation
is at a single point, multiple generations must be
performed to obtain the corresponding points and the
results merged into a single SDF file, with the
associated I/O, processing, and storage overhead
costs it imposes.
Aside from the above issues, significant overhead
costs of exchanging timing information among tools
in this manner are involved. The generation of the
information requires formatting, I/O processing, and
storage resources, while the consumer requires I/O,
parsing, and processing resources. With large SoC
designs, the can well take tens of hours and hundreds
of Mbytes of storage with each tool.
6. Open Library Architecture (OLA)
The creation of the Delay Calculation Language
(DCL) based Delay Calculation System (DCS) by
IBM introduced the concept of embedding timing
calculation algorithms within a technology library.
The application would “converse” with the library
through a standard set of application programming
interfaces (API) to request particular timing
information rather than accessing and interpreting
raw timing information from a library and then
calculating the desired result. Later enhancements to
include power calculation capabilities resulted in the
IEEE 1481-1999 standard for Delay and Power
Calculation System (DPCS) [1].
Subsequent extensions to the system to include
graph-based functional descriptions, vector based
timing and power arithmetic models, and cell and pin
properties and attributes from Accellera’s Advanced
Library Format (ALF) standard [2][3] further
expanded its capabilities. This resulting SI2 Open
Library Architecture (OLA) standard was further
improved upon to include more concise APIs for
interconnect parasitics, with later additional APIs
developed to address signal integrity issues such as
cross-coupling, noise propagation, parasitic analysis,
and physical characteristics for floor planning,
placement, routing, etc. [4][5].
- 5 -
6.1 OLA Concept
The purpose of OLA is to provide a single method by
which information required by an application is
consistently and accurately calculated. It replaces the
traditional method of parsing and interpreting
characterization information from varying view
formats and calculating the desired results using
application-specific algorithms with a compiled
library from which the desired information can be
programmatically requested, calculated, then returned
as shown in Figure 2.
The concept of OLA, and of DPCS in general, is that
an application dynamically links the OLA library at
runtime, and “converses” with the library through a
standard defined set of a programming interfaces
(API) to obtain such information as is needed by that
application. The application initiates the request for
information, and the library responds to the requests,
returning the requested information. It does so by
using the information provided through the API,
using internally cached information, using library
characterization information, and/or requesting
additional information from the application, then
calculating the requested information and returning it
to the application. At any time during this
“conversation”, additional information may be
requested until all required information has been
collected, the results calculated, and then returned to
the requestor.
A very simplistic example of this interaction can be
shown for an application, such as a static timing
analysis tool, requiring timing within a flip-flop cell
‘dff’ from the rising clock ‘ck’ to falling output ‘q’.
APP: request timing from OLA
OLA: get passed timing path
OLA: request PVT from APP
APP: return PVT
OLA: get passed ‘ck’ slew
OLA: request ‘q’ load from APP
APP: return ‘q’ load
OLA: calculate timing
(early/late delay/slew)
OLA: return timing
APP: use requested timing information
6.2 OLA Benefits
By embedding the algorithmic calculations within the
library itself, consistent results are always obtained
for use by the requesting application. Slew/rate
information is calculated in conjunction with delay
timing, providing those tools additional information
not otherwise directly available through SDF timing
information exchange. Since network reduction,
parasitic analysis, cross-talk, and noise propagation
methodologies are embedded as well, interconnect
timing calculations between cells is as consistent as
that within cells. Providing this consistent and
additional information, as well as eliminating
annotation failures due to timing view inconsistencies
and conflicting/ambiguous annotation information
interpretation, significantly reduces timing closure
iteration cycles.
The generation of multiple SDF files at different PVT
points, and overhead costs of merging them into an
acceptable form for consuming tools, is eliminated
since instance-specific timing calculations are
supported. Incremental timing is easily performed on
demand, again eliminating the requirement for SDF
generation and annotation to account for incremental
design changes.
Because timing information and algorithms are
compiled into the library instead being made
available in a readable format, intellectual property
content can be hidden from the user. This protects the
vendor’s IP, allows for the implementation of internal
timing within the IP, and also prevents local
“hacking” of library information by users.
In addition to providing a consistent calculation
methodology, functional expressions, such as
specified for conditional timing and functional
behavior, is available in a graph-based form. This
removes from each application the requirement to
parse and interpret expressions, again eliminating
inconsistent interpretation of library information
among tools. It also provides consistent functional
information such that synthesis, formal verification,
Figure 3. Timing Example
d q
ck
dff
Tool
OLA LIB
DDPP
CCSS
Figure 2. OLA Concept
Tool
Tool
- 6 -
and simulation tools can use the library as well,
eliminating even more views from the design flow.
6.3 Design Flow Usage
The most productive usage of OLA libraries within
the design flow involves those stages relating to
timing closure. By replacing the separate typical
static timing analysis sub-flows involving the
foundry-supplied delay calculator (Figure 4) with one
interfacing with the OLA library (Figure 5), a more
concise, consistent, and accurate timing analysis can
be performed. This eliminates the need for a stand-
alone delay calculator since the timing algorithms
contained therein are now included within the library
itself, and provides slew as well as timing
information to be provided to the analysis tool.
Notably missing from this sub-flow is the SDF file,
the usage of which for timing back-annotation is no
longer required. The reduction in the number of
required library view formats to that of OLA only,
eliminating perhaps inconsistent and inaccurate
timing views from the analysis sub-flow, promotes
faster timing convergence as well.
The combination of compatible timing views,
consistent timing calculations, and the elimination of
incomplete timing information exchange through the
intermediate SDF file, greatly reduces the number of
iterations required to converge upon a timing
solution.
7. OLA Based Design Flow
An equivalent design flow which integrates OLA
libraries therein, replacing the stand-alone foundry-
supplied delay calculator and SDF back-annotation
file, is shown in Figure 6. The relative simplicity of
this flow with respect to the previous typical one is
immediately apparent by the simplified timing
closure stages, as well as the notable reduction in the
number of required library views for the various tools
included within the flow.
7.1 Timing Closure Improvements
The consistent and accurate timing calculation
algorithms embedded within the OLA library allow
faster convergence to a reliable timing solution. This
capability is available for pre-route, floor plan, and
post-route stages of timing closure, all using
consistent timing calculation methods and algorithms
embedded therein.
Iterations within the timing closure stages of the
design flow are significantly reduced primarily due to
this combination of accurate and consistent timing
calculation. The single timing engine within the
library itself provides consistent information to the
application, complete with slew times, for both early
and late timing. Tool-specific algorithms are avoided,
as are the commonplace incompatibilities among
differing timing views usually present within the
associated libraries. Back-annotation of [in]complete
timing information using SDF files is also avoided
since such information need not be exchanged among
SDF
Static Timing
Analysis
Slew Report
Delay Calculation LIB
LIB
Figure 4. Typical Timing Analysis
OLA LIB DD
PPCC
SS
Static Timing
Analysis
Figure 5. OLA Timing Analysis
DDee
ssii gg
nn DD
aatt aa
bbaa
ssee
Netlist
SPEF Parasitics Extraction
Place & Route Clock Tree Pad Ring
Custom Wireloads
Wireload Extraction
Floor Planning
Netlist
Synthesis Optimization
Scan Insertion
RTL
Formal Verification
Functional Simulation
OLA LIB
Formal Verification
DDee
ll aayy aa
nndd
PPoo
wwee
rr CCaa
ll ccuu
ll aatt ii oo
nn SS
yysstt ee
mm (( OO
LLAA
))
Static Timing
Analysis
Netlist Static Timing
Analysis
Static Timing
Analysis
Figure 6. OLA Based Design Flow
- 7 -
tools, but rather are calculated and provided as
needed by the application.
Instance specific PVT-related timing is provided for
consideration of IR drop and thermal affects, as is the
capability to provide incremental timing as opposed
to requiring generation and exchange of complete
block or design timing information.
Interconnect timing calculations, with the associated
parasitics network reduction and waveform
propagation algorithms, are part of the library timing
engine, and provides consistent results to all
applications. Signal integrity issues such as cross-talk
can be implemented therein, as can be the inclusion
of inductance for RLC rather than RC based timing.
7.2 Extended Integration
In addition to the elimination of the delay calculation
tool and SDF file, note the further elimination of
many of the tool-specific library views. Since OLA
libraries provide information and associated
algorithms in a standard accessible method, and
provide for more than just timing and power analysis
tools, OLA-compliant tools other than those intended
strictly for static timing analysis can be integrated
into the design flow as well, further reducing the
need for the various formats of tool-specific views
previously required. Such tools include synthesis,
scan insertion, optimization, functional simulation,
formal verification, and many others.
An extremely aggressive integration of OLA-
compliant tools and libraries, utilized wherever
possible within a complete industry design flow [7],
can dramatically reduce the number of required
library views, as shown in Figure 7, yielding
corresponding improvements within the design flow.
8. Conclusion
The capability of providing consistent and accurate
timing information at all levels of the design process,
from pre-route through post-route, can dramatically
reduce, if not eliminate, iterations within timing
closure stages, converging on a design solution which
meets performance objectives much faster and more
easily than with traditional approaches.
Interconnect analysis, with due consideration of
signal integrity issues, can be calculated in a
consistently accurate manner, allowing faster timing
convergence once the physical implementation of a
design is realized. In addition, instance-specific PVT-
based timing provides for increased accuracy where
IR drop and thermal effects may manifest
themselves, and the capability to provide incremental
timing on demand eliminates the need for further
iteration cycles.
Above all, the elimination of SDF file based timing
information exchange requirements among tools,
with the incurred compatibility, resource, and time
costs, greatly improves design development
productivity.
In conclusion, the integration of OLA libraries within
a design flow, in conjunction with appropriate OLA-
compliant tools, can significantly improve design
efforts by reducing timing closure time through the
use of more accurate and consistent timing
calculation methods, which directly contributes to
reduced design cycle time.
References [1] Design Automation Standards Committee of the
IEEE Computer Society, “IEEE Standard for
Integrated Circuit (IC) Delay and Power
Calculation System”, IEEE 1481-1999, 26 June
1999.
[2] Accellera, “Advanced Library Format (ALF) for
ASIC Technology, Cells, and Blocks”, revision
2.0, 14 December 2000.
[3] IEEE P1603, “A standard for an Advanced
Library Format (ALF) describing Integrated
Circuit (IC) technology, cells, and blocks”,
revision draft 2, 12 November 2001.
[4] Silicon Integration Initiative, “Specification for
the Open Library Architecture (OLA)”, revision
1.7.04, 3 January 2002.
[5] J. Abraham, S. Churiwala, “Flexible Model for
Delay and Power”, Silicon Integration Initiative,
1998.
[6] T. Tessier, C. Buhlman, “Timing Closure of a
870Kgate + 3 Mbit Ram, 0.2u-12mm Die in a
1312 Pin Package IC”, SNUG 2001.
[7] T. Ehrler, “Multiple Design Flows: Reducing
Support Requirements with OLA”, Custom
Integrated Circuits Conference 2001, ALF/OLA
Panel Discussion, 6-9 May 2001.
Design Process Tools Standard /Proprietary
Formats
TotalFormats
OLAReplaceable/ Deleteable
TotalFormats
FormatReduction
RTL Development/Analysis 5 3/0 3 2/0 2 33%Design Synthesis 7 4/6 10 4/1 6 40%
Logic/Timing Verification 17 5/11 16 6/5 6 63%Partitioning & Floor Planning 11 3/9 12 5/0 8 33%
Layout & Chip Finishing 21 4/15 19 6/2 12 37%
Figure 7. Library View Requirement Reduction