placement and timing for fpgas considering variations
DESCRIPTION
Placement and Timing for FPGAs Considering Variations. Yan Lin 1 , Mike Hutton 2 and Lei He 1 1 EE Department, UCLA 2 Altera Corporation, San Jose. Outline. Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results - PowerPoint PPT PresentationTRANSCRIPT
© 2005 Altera Corporation© 2006 Altera Corporation
Placement and Timing for FPGAs Considering Variations
Yan Lin1, Mike Hutton2 and Lei He1
1EE Department, UCLA2Altera Corporation, San Jose
2© 2006 Altera Corporation
OutlineOutline
Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions
3© 2006 Altera Corporation
BackgroundBackground Process variations
more and more significant in nanometer technology affect timing and power in both ASICs and FPGAs
Delay with variations Variation sources
Threshold voltage (Vth) and effective channel length (Leff) Independent Gaussians for global/local variations First order canonical form
Related work FPGA device and architecture evaluation with process variations
[Wong et al, ICCAD’05] SSTA [Chang et al, ICCAD’03] [Viseswariah et al, DAC’04] Statistical criticality analysis [Viseswariah et al, DAC’04] [Li et al,
ICCAD’05] [Xiong et al, TAU’06] Statistical gate sizing for ASICs [Guthaus et al, ICCAD’05] [Sinha et
al, ICCAD’05]
n
ianii RaXaa
110
4© 2006 Altera Corporation
MotivationMotivation STA is inaccurate with
variation Slack ignores near criticality Near-critical paths may be
statistically timing critical
Deterministic timing-driven placer (e.g. T-VPlace in VPR) Based on STA Optimize for static critical path May not optimize timing with variation
Stochastic placer is needed with variations Same placement for one application across chips
5© 2006 Altera Corporation
Pre-routing Interconnect Uncertaintyvs. Process Variation in PlacementPre-routing Interconnect Uncertaintyvs. Process Variation in Placement
Clearly, process variation leads to a more significant delay variance in placement stage Therefore, only consider process variation for placement
Existing timing-driven placer Leverages timing
slack in STA With interconnect
delay estimated May incur
uncertainty along with process variation
6© 2006 Altera Corporation
OutlineOutline
Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions
7© 2006 Altera Corporation
Uniqueness for Timing in FPGAsUniqueness for Timing in FPGAs FPGAs vs. ASICs
Similarity Susceptible to process variations
Advantages Long switching paths dampen (average out) local variation Binned for speed-grades to isolate global variation Can be programmed repeatedly and differently during timing
chip-test
Disadvantages Critical paths unknown at test time Same timing model to be applied to unknown applications at
unknown clock frequency and varied conditions Guard-banded timing model can be arbitrarily conservative or
aggressive
8© 2006 Altera Corporation
Timing with Guard-bandingTiming with Guard-banding A guard-band is applied for individual node to model
uncertainty in STA
A constant guard-banded delay is µ+cσ µ and σ are the nominal delay and standard deviation,
respectively c is constant for all circuit elements
Guard-band cost (Tgrd/Tnorm)-1 Tgrd : critical path delay in STA w/ guard-banding Tnorm: critical path delay in STA w/ nominal timing model Pessimistic/optimistic for designs with longer/shorter critical path Actual timing yield analyzed by SSTA
9© 2006 Altera Corporation
Timing with Speed-binningTiming with Speed-binning Test and eliminate local variation by testing multiple
similar paths across the test chip Model global variation Gaussians ΔXi as a single ΔGa
Speed-binning = Categorizing ΔGa
All chips fell into the same bin share the same guard-banded timing model e.g., µ-σg /µ+σg/ µ+3σg for fast/medium/slow bin STA for the circuit delay Tbin for each bin
10© 2006 Altera Corporation
Yield Analysis with Speed-binningYield Analysis with Speed-binning
Yield loss due to ignored local variation
Yield loss due to unknown critical paths
a
kG
kG Tl
aTgbina Gd
GTkTcdfGpdfkyieldtiming
up
low
)(
)(
)()()()(_
Timing yield analysis for a bin circuit delay Tµ+σTgΔGa+σTlΔRa
bin k [Glow(k), Gup(k) ]
cut-off delay γTbin(k)
timing yield for bin k is
The overall timing yield is
1
1
)(__n
k
kyieldtimingyieldtiming
11© 2006 Altera Corporation
OutlineOutline
Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions
12© 2006 Altera Corporation
Timing-Driven Placement T-VPlace [Marquardt et al, FPGA 2000]Timing-Driven Placement T-VPlace [Marquardt et al, FPGA 2000] Simulated annealing based placement Both wiring and timing are considered in the cost function
netsN
iyx ibbibbiqCostWiring
1
)]()()[(_ Wiring cost
maxDjislackjiycriticalit
jiycriticalitjidjiCostTiming
/),(1),(
),(),(),(_
ji
jiCostTimingCostTiming,
),(__
Timing cost for a connectionfor a placement solution
iring_CostPrevious_W
CostWiring
iming_CostPrevious_T
stΔTiming_CoC
_)1(
Overall cost
STA is performed at each annealing temperature to update critical path delay and slack
13© 2006 Altera Corporation
Stochastic Placement ST-VPlaceStochastic Placement ST-VPlace Main differences between ST-VPlace and T-VPlace
Estimate delay matrix in canonical form instead of just nominal delay matrix
Used in SSTA for statistical timing cost during placement Perform SSTA instead of STA at each temperature in simulated
annealing framework Using statistical criticality instead of static criticality in cost
function Statistical criticality for an edge/node is the probability that this
edge/node is statistically timing critical in SSTA Statistical criticality exponent θ
Static criticality is based on slack and the longest path delay in STA
ji
jiCostSTimingCostSTiming
jitySCriticalijidjiCostSTiming
,
),(__
),(),(),(_
14© 2006 Altera Corporation
OutlineOutline
Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions
15© 2006 Altera Corporation
Experimental SettingsExperimental Settings Variation and device setting
10% as 3 sigma for global and local variation in Vth and Leff at IRTS 65nm technology node
Min-ED device setting Vdd=0.9v Vth=0.3v [Wong et al, ICCAD’05]
Architecture similar to Altera’s StratixTM
Island style FPGA architecture cluster size 10 and LUT size 4 60% length-4 and 40% length-8 wire in interconnects 1.2X routing channel width obtained by T-VPlace
Yield loss in failed parts per 10K parts (pp10K)
Evaluated using MCNC and QUIP designs
16© 2006 Altera Corporation
Cost Function TuningCost Function Tuning Perform ST-VPlace and SSTA to obtain mean delay and standard
deviation over all designs for each statistical criticality exponent θ
θ=0.3 leads to the smallest mean and deviation the highest timing yield
21.2
21.4
21.6
21.8
22
22.2
22.4
22.6
22.8
23
0 0.5 1 1.5 2 2.5
Statistical Criticality Exponent
me
an
de
lay
(n
s)
3
3.05
3.1
3.15
3.2
sta
nd
ard
de
via
tio
n (
ns)
TmeanTsigma
0.1
0.2
0.3
0.4 1.0
0.5
2.0
17© 2006 Altera Corporation
T-VPlace vs. ST-VPlaceT-VPlace vs. ST-VPlace
Some correlation between mean delay and deviation ST-VPlace achieves
smaller mean delay for all designs smaller variance for most designs a higher timing yield
0.80
0.85
0.90
0.95
1.00
1.05
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39MCNC & QUIP Designs
normalized Tmean (ST-VPlace)
normalized Tsigma (ST-VPlace)
18© 2006 Altera Corporation
Statistical Criticality vs. Static CriticalityStatistical Criticality vs. Static Criticality
Statistic criticality vs. static criticality Statistical criticality does not increase monotonically with static one
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0 0.2 0.4 0.6 0.8 1.0(Static Criticality)^8
(a)
(Sta
tis
tic
al
Cri
tic
ali
ty)^
0.3
Statistical Criticality vsStatic Criticality
Statistical criticality may vary significantly with similar static one
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.32 0.34 0.36 0.38 0.4 0.42(Static Criticality)^8
(b)
(Sta
tis
tic
al
Cri
tic
ali
ty)^
0.3
Statistical Criticality vsStatic Criticality
227X
114X
4X
ST-VPlace considers statistical criticality explicitly Optimizes near-critical paths under variations Leads to a higher timing yield
19© 2006 Altera Corporation
Impact on Path-length DistributionImpact on Path-length Distribution
Path-length distribution in ST-VPlace is almost on top of that in T-VPlace
Path length distribution
0%
5%
10%
15%
20%
25%
0.0 5.0 10.0 15.0Path delay (ns)
(a)
Pe
rce
nta
ge
T-VPlace
ST-VPlace
Path length distribution (paths with 85%-100% of critical path delay)
0%
1%
2%
3%
4%
5%
6%
15.0 15.5 16.0 16.5 17.0Path delay (ns)
(b)
Pe
rce
nta
ge
T-VPlace
ST-VPlace
ST-VPlace reduces top 10% near-critical paths from 1.3% to 0.8% Although has a larger nominal delay But has a smaller mean and variance a higher timing yield
20© 2006 Altera Corporation
Effect of Guard-bandingEffect of Guard-banding
Variation (3sigma) global 10% local 10%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ban
d c
ost
0.1
1.0
10.0
100.0
1000.0
10000.0
Yie
ld lo
ss (
pp
10k)
guard-band costT-Vplace yield lostST-VPlace yield lost
Variation (3sigma) global 5% local 5%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ban
d c
ost
0.1
1.0
10.0
100.0
1000.0
10000.0
Yie
ld l
oss
(p
p10
k)
guard-band costT-Vplace yield lostSTV-Place yield lost
Variation (3sigma) global 20% local 20%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ban
d c
ost
0.1
1.0
10.0
100.0
1000.0
10000.0
Yie
ld lo
ss (
pp
10k)
guard-band costT-Vplace yield lostST-VPlace yield lost
ST-VPlace obtains a higher timing yield under varied variations and guard-band factors Larger gain with smaller variation
21© 2006 Altera Corporation
Effect of Guard-bandingEffect of Guard-banding
Variation (3sigma) global 10% local 10%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ban
d c
ost
0.1
1.0
10.0
100.0
1000.0
10000.0
Yie
ld lo
ss (
pp
10k)
guard-band costT-Vplace yield lostST-VPlace yield lost
Variation (3sigma) global 5% local 5%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ban
d c
ost
0.1
1.0
10.0
100.0
1000.0
10000.0
Yie
ld l
oss
(p
p10
k)
guard-band costT-Vplace yield lostSTV-Place yield lost
Variation (3sigma) global 20% local 20%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ban
d c
ost
0.1
1.0
10.0
100.0
1000.0
10000.0
Yie
ld lo
ss (
pp
10k)
guard-band costT-Vplace yield lostST-VPlace yield lost
ST-VPlace obtains a higher timing yield under varied variations and guard-band factors Larger gain with smaller variation
Variation (3sigma) global 0% local 10%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ba
nd
co
st
0.1
1.0
10.0
100.0
1000.0
10000.0
Yie
ld lo
ss
(p
p1
0k
)
guard-band costT-Vplace yield lostST-VPlace yield lost
Variation (3sigma) global 0% local 5%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ba
nd
co
st
0.1
1.0
10.0
100.0
1000.0
10000.0Y
ield
los
s (
pp
10
k)
guard-band costT-Vplace yield lostST-VPlace yield lost
Variation (3sigma) global 0% local 20%
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4Guard-band factor
Gu
ard
-ba
nd
co
st
0.1
1.0
10.0
100.0
1000.0
10000.0
Yie
ld lo
ss
(p
p1
0k
)
guard-band costT-VPlace yield lostST-VPlace yield lost
Similar gain with varied local variation when no global variation is considered
Yeild loss reduced by 3.4X with 3 sigma guard-banding under 10%/10% variations
22© 2006 Altera Corporation
Effect of Speed-binningEffect of Speed-binning Fast/Medium/Slow = 40%/30%/29.999% Discard the slowest 0.001% (0.1pp10K) chips Tbin may be relaxed by γ for a higher timing yield
0.1
1.0
10.0
100.0
1000.0
10000.0
1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1 1.11
relaxed factor
yie
ld lo
ss
(pp
10k
)
T-VPlaceST-VPlace
15651107
700
376
155
45
13
3.7
1.30.55 0.28
16983
36
13
4.5
1.8
0.93 0.52
0.320.22 0.17
Yield loss due to local variation and unknown critical paths ST-VPlace consistently achieves higher timing yield Yield loss is reduced by 25X with γ=5%
23© 2006 Altera Corporation
Conclusions and DiscussionsConclusions and Discussions Conclusions
Quantified the effects of guard-banding and speed-binning with variations
Developed a novel stochastic placer Evaluated with MCNC and QUIP designs, reduced
yield loss by 3.4X with guard-banding 25X with speed-binning
Ongoing and future work Extend timing models with spatial correlated variations Develop stochastic physical synthesis algorithms, e.g.,
clustering, routing, re-timing