continuing challenges in static timing analysis
DESCRIPTION
Continuing Challenges in Static Timing Analysis. Tom Spyrou TAU 2013 3/2013. Goal of this talk. Higher level than latest trends Remind ourselves the trade-offs we have made as an industry to have a workable solution for STA Signoff Embedded in Design Synthesis and Optimization - PowerPoint PPT PresentationTRANSCRIPT
Continuing Challenges in Static Timing Analysis
Tom Spyrou
TAU 2013
3/2013
Goal of this talk
Higher level than latest trends Remind ourselves the trade-offs we have made as an
industry to have a workable solution for STA- Signoff- Embedded in Design Synthesis and Optimization- Plenty of discussion on new effects, lets discuss core STA
Explain basis of industrial algorithms to academic community
Challenge ourselves to look at the issues again Technology trends
- Design- Compute
2
Why Static Timing Analysis Dynamic simulation is impossible for even a
small chip- Assume combination logic only- 100 inputs implies 2^100 vectors needed to verify timing which is
about 10^30 vectors- If a simulator could process 10^6 vectors per second this works
out to a sim time of 10^19 days or about 10^15 years- Talk about a verification bottleneck!
Now add in state elements and the problem of making sure the critical path is actually in the vector set
STA can analyze such a design in 1 minute- There are some issues, but they can be mitigated
STA’s quality of result is not dependent on the quality of the vector set
What is the trade-off / core issues?These have been unchanged for a long time A different kind of setup
- Result is dependent on quality of constraints and exceptions- If all storage elements are clocked and i/o’s constrained generally safe
Less accurate delay analysis- Exact path is not really known as with event driven simulation- When STA was first introduced this was less of an issue, PBA now essential
Introduction of false paths due to topological not functional analysis- Users have to manually specify these
Multiple circuit modes take extra effort- Not just more vectors
Loops and level sensitive latches add complexity
Analysis
Every circuits looks the same to STA since it ignores the functions of the logic.
Topological analysis
Simplifies problem, possibility of reporting false paths
What do recent trends mean Design
- Hyper-optimization means accuracy is critical- When a chip is designed at a bleeding edge technology it will be pushed
on all dimensions of power, performance and area Simulation based delay calculation Path based analysis
- Design size means memory use is #1 problem Largest chips are approaching 1TB of RAM needed for flat runs Hierarchical / Parallel solutions must prioritize memory use on compute nodes Runtime also needs to be faster but the first step is to run on machines with
reasonable cost Recent design uses 750+Gig of RAM for single mode/corner STA
Compute- CPU is cheap, data movement is expensive
Whenever you hear its an expensive calculation don’t avoid it- Parallel computing must not only improve performance but also accuracy
and features. Don’t just make the same problem go faster or just divide the data
If you ask a designer what doesn’t work well
Hierarchical timing in the final verification loop SI calculations very conservative SDC’s are large and hard to verify Worst case timing is done and process variation
is modeled very pessimistically Block based analysis loses too much accuracy True delay (looking at combinational logic to
prove a path true) reporting is slow and can’t run during optimization
Libraries limit flexibility of analysis
STA Industry and Academia STA technology has been innovated inside Industry much
more than in Academia
The key approaches are not documented
There is no open source reference to build from
Industry protects the core concepts as trade secrets
Academia does not (rarely) publish on STA beyond single clock designs or delay calculation
We need a book on the core search algorithm
Example, Veritime from the 90’s
STA Engine that required vectors for the clock
Dynamic simulation of the clock- Period, multicycle paths, clock to clock false paths automatically
determined
STA for data portion
Absorbed by Cadence and forgotten since at the time SDCs were a lot easier to hand inspect
Requirements of an STA Engine I would like to begin by documenting the basics that everyone in Industry
knows. There are no company specific trade secrets
Must run in linear memory and runtime with circuit size, number of clocks, exceptions, and number of storage element- Touch each vertex only once, maybe twice to simplify pre-processing,
not once per clock or exception
Must support SDC timing constraints- Clocks, clock tree assumptions, multi-cycle paths, false paths, path
delays, cases and modes
Must be nearly spice accurate in delays and support path based
Must be incremental enough- Netlist changes / full retrace on one extreme- Query based incremental with limited tracing on the other
The Basic Search The Graph
- Startpoints are inputs to the circuit and clock inputs to storage elements- Endpoints are outputs of circuit and data inputs of storage elements
Propagate the Clocks- For each clock input BFS to all clock data pins- Offset startpoint arrival times and end point required times with
information from the clock propagation and cycle accounting
Propagate the Data- Use a BFS from startpoints to end points- Use multiple timing totals at every pin to take into account multiple
clocks and exceptions- Can optionally store back pointers to record K critical paths but this
time/memory is wasted on optimization programs and should be left to a reporting phase
Multiple Timing Totals with Partial Path
Simplistic implementation is that each clock and each exception gets its own total- Simultaneously or via separate traces
Memory and/or runtime increase quickly- Occurrence pins are the most common netlist object- There can be thousands of exceptions
At Timing endpoints like totals can be combined and evaluated
At Timing endpoints point to point exceptions can be evaluated
Multiple Timing Totals
Combinational Logic
Q
CLK1
D Q
CLK1
D
Q
CLK2
D Q
CLK2
D
CLK1
CLK2
N/A
0
CLK1
CLK2
0
N/A
Multiple Timing Totals with path completion data
A BFS has no information about paths However timing exceptions are specified in terms of
from, through, and to paths with a boolean expression of pins
Mcp –from a –through {b c} –to d From a through b or c and also through d Each total can have a small state machine about what
exception points it has seen At timing endpoints like totals with like exception point
data can be combined or if false not combined
Through exceptions
Q
CLK1
D Q
CLK1
D
Q
CLK2
D Q
CLK2
D
CLK1
CLK2
N/A
0
CLK1
CLK2
0
N/A+
X
B
C
B
C
N/A
N/A
B
C
N/A
N/A
CLK1
CLK2
N/A
val
B
C
N/A
val
Framework can be used for Clock Pessimism
17
d1,d2Arr 1
d1
d2
d3
d1,d2,d3Arr 2
d1,d2Arr 1d1,d2,d3Arr 2
d1,d2Arr 1d1,d2,d3Arr 2
d1,d2Arr 1d1,d2,d3Arr 2
Delay Calculation, Multiple Timing Totals
Worst case slew merging is pessimistic but allows Delay Calculation to be a pre-process step
If Delay Calculation is done in the BFS the critical slew merging can be done
It is also possible for each timing total to carry its own slew to improve accuracy
Loops can be auto detected and dynamically broken avoiding accidental critical path breaks
Incremental Timing
Netlist edits, full retime Netlist edits, fanout cone retime Netlist edits, query based retime
The choice of how incremental to go depends on the optimization approach
More global cost functions require less incrementalness
More locally greedy approaches require more
STA needs innovation
Increased sharing to Academia
Increased research on the problems that are still problems
Redirect solutions in light of the Design and Compute trends
There is a lot of interesting work to do!
Some ideas
New constraint language that is more functional
Try to propagate the function with the delays- Some combination with cycle based simulation- Constraint language enhancements
Library-less delay models
New data model which is stage based- Focus on data locality
Hierarchical timing model which is truly context independent within acceptable limitations- Constraint improvements to help constraint blocks more accurately