cse241 l4 system.1kahng & cichy, ucsd ©2003 cse241a vlsi digital circuits winter 2003 lecture...
Post on 21-Dec-2015
231 views
TRANSCRIPT
CSE241 L4 System.1 Kahng & Cichy, UCSD ©2003
CSE241AVLSI Digital Circuits
Winter 2003
Lecture 04: Static Timing Analysis
CSE241 L4 System.2 Kahng & Cichy, UCSD ©2003
All emails and announcements are archived on the class web page
Remember: tomorrow (Friday, January 17) noon-1:20pm (EBUI 3327) prepares you for the synthesis lecture on Tuesday (see syllabus on web for expected content)
HW questions 4-6 due todayLab #1 report due Monday; turn-in instructions and
script announced last nightThis class: introduction to static timing analysis
Goals:How does STA work?What are use models for STA?(What kinds of things are done with STA results?)
Logistics
CSE241 L4 System.3 Kahng & Cichy, UCSD ©2003
Determine fastest permissible clock speed (e.g. 100MHz) by determining delay (including set-up and hold time) of longest path from register to register (e.g. 10ns)
Largely eliminates need for gate-level simulation to verify delay of the circuit
Goal of Static Timing Verification
clk
Combinationallogic
clk
Combinationallogic
clk
Combinationallogic
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.4 Kahng & Cichy, UCSD ©2003
Cycle Time - Critical Path Delay
Cycle time (T) cannot be smaller than longest path delay (Tmax)
Longest (critical) path delay is a function of:
Total gate, wire delays logic levels
clock
Q1 Q2
Tclock1 Tclock2
critical path, ~5 logic levels
Tclock1
data
cycle time
maxT T
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.5 Kahng & Cichy, UCSD ©2003
Cycle Time - Setup Time
For FFs to correctly capture data, must be stable for:
Setup time (Tsetup) before clock arrives
clock
Q1 Q2
Tclock1 Tclock2
critical path, ~5 logic levels
Tclock1
data
setup time
max setupT T T
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.6 Kahng & Cichy, UCSD ©2003
Cycle Time – Clock Skew
clock
Q1 Q2
Tclock1 Tclock2
Tclock1
Tclock2
Q2
data
clock skewQ2
6
If clock network has unbalanced delay – clock skew
Cycle time is also a function of clock skew (Tskew) max setup skewT T T T
critical path, ~5 logic levels
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.7 Kahng & Cichy, UCSD ©2003
Cycle Time – Flip-Flop Delay (Clock to Q)
Cycle time is also a function of propagation delay of FF (Tclk-to-Q)
Tclk-to-Q : time from arrival of clock signal till change at FF output)
clock
Q1 Q2
Tclock1 Tclock2
Tclock1
Tclock2
Q2clock-to-Q
data
Q2
max setup skew clk to QT T T T T
critical path, ~5 logic levels
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.8 Kahng & Cichy, UCSD ©2003
Min Path Delay - Hold Time
For FFs to correctly latch data, data must be stable during:
Hold time (Thold) after clock arrives
Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew)
clock
Q1 Q2
Tclock1 Tclock2
short path, ~3 logic levels
Tclock1
data
hold time
min hold skewT T T
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.9 Kahng & Cichy, UCSD ©2003
Setup, Hold, Cycle Times
set-up time – D stablebefore clock
cycle time
Example of a single phase clock
hold time –D stable after clock
When signalmay change
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.10 Kahng & Cichy, UCSD ©2003
Approach – Reduce to Combinational
clk
Combinationallogic
clk
Combinationallogic
clk
Combinationallogic
original circuit
extracted block
Combinationallogic
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.11 Kahng & Cichy, UCSD ©2003
Combinational Block
Arrival time in green A
C
B
f
2
2
2
1
0
1
0
.20
.20
.20
.10
X
YZ
W
.15
.05
.05
.05
Interconnect delay in red
Gate delay in blue
What’s the right mathematical object to use to represent this physical object?
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.12 Kahng & Cichy, UCSD ©2003
Problem Formulation - 1
C
B
f
X
Y
W
0
.05.1
1
.2
0
0
1
A
.15.20
.20
A
C
B
f
2
2
2
1
0
1
0
.20
.20
.20
.10
X
YZ
W
.15
.05
.05
.05
12
2
2
Z
Use a labeled directed graph
G = <V,E>
Vertices represent gates, primary inputs and primary outputs
Edges represent wires
Labels represent delays
HW #7: In this graph model, both nodes and edges have delays. How would you modify this model so that only edges have delay, yet the same timing analysis remains possible?
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.13 Kahng & Cichy, UCSD ©2003
Problem Formulation – Actual Arrival Time
Actual arrival time A(v) for a node v is latest time that signal can arrive at node v
uu
A( ) max (A(u) d )
FI( )
X
Y
A(Z)
Z
x zd
Y zd
A(X)
A(Y)
where d is delay from to andu u, {X,Y}, {Z}. FI(υ) = =
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.14 Kahng & Cichy, UCSD ©2003
Problem Formulation - 2
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
1
A
.15.20
.20
A
C
B
f
2
2
2
1
0
1
0
.20
.20
.20
.10
X
YZ
W
.15
.5
.05
.05
12
2
2
Z
Use a labeled directed graph
G = <V,E>
Enumerate all paths - choose the longest?
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.15 Kahng & Cichy, UCSD ©2003
Problem Formulation - 3
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
.1
A
.15.20
.20
12
2
2
Z
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
0
0
0
0
0
Origin
(Kirkpatrick 1966, IBM JRD)
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.16 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
.1
A
.15.20
.20
12
2
2
Z
0
0
0
0
0
Origin
0
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.17 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
.1
A
.15.20
.20
12
2
2
Z
0
0
000
Origin 0
0
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.18 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
.1
A
.15.20
.20
12
2
2
Z
0
0
0
.1
0
Origin
0
0
0
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.19 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
.1
A
.15.20
.20
1
2
2
2
Z
0
0
0
.1
0
Origin
0
0
0
2.0
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.20 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
.1
A
.15.20
.20
12
2
2
Z
0
0
0
.1
0
Origin
1.1
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.21 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
.1
A
.15.20
.20
12
2
2
Z
0
0
0
.1
0
Origin
1.1
2.2
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.22 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
.1
A
.15.20
.20
12
2
Z
0
0
0
.1
0
Origin
1.1
2
3
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.23 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
1
A
.15.20
.20
1
23.6
2
Z
0
0
0
0
0
Origin
1.1
2
3
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.24 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
1
A
.15.20
.20
1
23.6
2
Z
0
0
0
0
0
Origin
1.1
2
5.8
3
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.25 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
1
A
.15.20
.20
1
23.6
2
Z
0
0
0
0
0
Origin
1.1
2
5.8
3
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.26 Kahng & Cichy, UCSD ©2003
Algorithm Execution
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
1
A
.15.20
.20
1
23.6
2
Z0
0
0
0
0
Origin
1.1
2
5.8
3
5.95
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.27 Kahng & Cichy, UCSD ©2003
Critical Path (sub-graph)
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
1
A
.15.20
.20
1
23.6
2
Z0
0
0
0
0
Origin
1.1
2
5.8
3
5.95
Compute longest path in a graph G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.28 Kahng & Cichy, UCSD ©2003
Timing for Optimization: Extra Requirements
Longest-path algorithm computes arrival times at each node
If we have constraints, need to propagate slack to each node
A measure of how much timing margin exists at each node Can optimize a particular branch
Can trade slack for power, area, robustness
clockCourtesy K. Keutzer et al. UCB
CSE241 L4 System.29 Kahng & Cichy, UCSD ©2003
Required Arrival Time
Required arrival time R(v) is the time before which a signal must arrive to avoid a timing violation
Then recursively
X
Y
R(Z)Z
Yxd
X zd R(X)
R(Y)
uu
R( ) min (R(u) d )
FO( )
where {Y,Z} and {X} FO( ) = =
Required time is user defined at output:
setupR(v) = T - T
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.30 Kahng & Cichy, UCSD ©2003
Required Time Propagation: Example
C
B
f
X
Y
W
0
.5.1
1
.2
0
0
1
A
.15.20
.20
1
23.6
2
Z
Assume required time at output R(f) = 5.80
Propagate required times backwards
0
O
0
0
0
Origin
1.1
2
5.8
3
5.95
5.65
3.45
3.45
0.95
0.45
-0.15
1.45
5.80
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.31 Kahng & Cichy, UCSD ©2003
Timing Slack From arrival and required time can compute slack. For
each node v:
Slack reflects criticality of a node
Positive slack Node is not on critical path. Timing constraints met.
Zero slack Node is on critical path. Timing constraints are barely met.
Negative slack There is a timing violation
Slack distribution is key for timing optimization!
S( ) R( ) A( )
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.32 Kahng & Cichy, UCSD ©2003
Timing Slack Computation: Example
Compute slack at each node
C
B
f
X
Y
W
A
ZOrigin
-0.15
-0.15
0.45
-0.15
0.45
-0.15
1.45
-0.15
S( ) R( ) A( )
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.33 Kahng & Cichy, UCSD ©2003
clk
Combinationallogic
clk
Combinationallogic
clk
Combinationallogic
Computing longest path delay• Full path enumeration - potentially exponential• Longest path algorithm on DAG (Kirkpatrick 1966, IBM JRD)
(O(v+e) or O(g + p))Currently applied to even the largest (>10M gate) circuitsTwo challenges
Asynchronous subcircuitsFalse paths
Approach of Static Timing Verification
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.34 Kahng & Cichy, UCSD ©2003
Enhancements of STA
Incremental timing analysis
Nanometer-scale process effects – variation ( probabilistic timing analysis)
Interference – crosstalk
Multiple inputs switching
Conservatism of delay propagation
HW #8: Suppose you change the size of one (combinational) gate in your design, thus invalidating the previous timing analysis. How much work must be done to regain a correct timing analysis?
Courtesy K. Keutzer et al. UCB
CSE241 L4 System.35 Kahng & Cichy, UCSD ©2003
Timing Correction
Driven by STA “Incremental performance analysis backplane”
Fix electrical violations Resize cells Buffer nets Copy (clone) cells
Fix timing problems Local transforms (bag of tricks) Path-based transforms
DAC-2002, Physical Chip Implementation
CSE241 L4 System.36 Kahng & Cichy, UCSD ©2003
Local Synthesis Transforms
Resize cells
Buffer or clone to reduce load on critical nets
Decompose large cells
Swap connections on commutative pins or among equivalent nets
Move critical signals forward
Pad early paths
Area recovery
DAC-2002, Physical Chip Implementation
CSE241 L4 System.37 Kahng & Cichy, UCSD ©2003
Transform Example
Delay = 4
…..
Double Inverter
Removal
…..
…..
Delay = 2
DAC-2002, Physical Chip Implementation
CSE241 L4 System.38 Kahng & Cichy, UCSD ©2003
Resizing
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
ad
e
f
0.2
0.2
0.3
?
b
aA
0.035
b
aC
0.026
DAC-2002, Physical Chip Implementation
CSE241 L4 System.39 Kahng & Cichy, UCSD ©2003
Cloning
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
a
d
e
f
g
h
0.2
0.2
0.2
0.2
0.2
?
b
a
d
e
f
g
h
A
B
DAC-2002, Physical Chip Implementation
CSE241 L4 System.40 Kahng & Cichy, UCSD ©2003
Buffering
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
a
d
e
f
g
h
0.2
0.2
0.2
0.2
0.2
? b
a
d
e
f
g
h0.1
0.2
0.2
0.2
0.2
B
B
0.2
DAC-2002, Physical Chip Implementation
CSE241 L4 System.41 Kahng & Cichy, UCSD ©2003
Redesign Fan-in Tree
a
c
d
b eArr(b)=3
Arr(c)=1
Arr(d)=0
Arr(a)=4
Arr(e)=61
1
1
c
d
e
Arr(e)=5
1
1b1
a
DAC-2002, Physical Chip Implementation
CSE241 L4 System.42 Kahng & Cichy, UCSD ©2003
Redesign Fan-out Tree
1
1
1
3
1
1
1
Longest Path = 5
1
1
1
3
1
2
Longest Path = 4Slowdown of buffer due to load
DAC-2002, Physical Chip Implementation