picap: a parallel and incremental capacitance extraction considering stochastic process variation...
TRANSCRIPT
PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation
Fang Gong1, Hao Yu2, and Lei He1
1Electrical Engineering Dept., UCLA
2Berkeley Design AutomationPresented by Fang GongPresented by Fang Gong
Outline
Background and Motivation
Algorithms
Experimental Results
Conclusion and Future Work
Outline
Background and Motivation
Algorithms
Experimental Results
Conclusion and Future Work
Process Variation and Cap Extraction
Process variation leads to capacitance variation
OPC lithography and CMP polishing
Capacitance variation affects circuit performance
Delay variation and analog mismatch
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%
-2~0
%0~
2%2~
4%4~
6%6~
8%
8~10
%
10~1
2%
Capacitance Variation (%)
% o
f Seg
men
ts
From [Kang and Gupta]
Background of BEM Based Cap Extraction
Capacitance extraction in FastCap Procedures
1. Discretize metal surface into panels
2. Form linear system by collocation
3. Results in dense potential coeffs
4. Solve by iterative GMRES
Fast Multipole method (FMM) to evaluate Matrix Vector Product (MVP) Preconditioned GMRES iteration with guided convergence
1 1
| |ij ipanel ii i j
P daa r r
P q v
Difficulties for stochastic capacitance extractionHow to consider variations in FMM?How to consider different variations in precondition?
Source Panel j
Observe Panel i
Motivation of Our Work
Existing works Stochastic integral by low-rank approximation
Zhu, Z. and White, J. “FastSies: a fast stochastic integral equation solver for modeling the rough surface effect”. In Proceedings of IEEE/ACM ICCAD 2005.
Pros: Rigorous formulationCons: Random integral is slow for full-chip extraction
Stochastic orthogonal polynomial (SOP) expansionCui, J., and etc. “Variational capacitance modeling using orthogonal
polynomial method”. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI 2008.
Pros: An efficient non-Monte-Carlo approach Cons: SOP expansion results in an augmented and dense linear
system
Objective of our work Fast multi-pole method (FMM) with nearly early O(n) performance
with a further parallel improvement.parallel improvement. Pre-conditioner should be updated incrementally for different
variation.
Outline
Background and Motivation
Algorithms
Experimental Results
Conclusion and Future Work
Flow of piCAP
1. Represent Pij with stochastic geometric moments
2. Use parallel FMM to evaluate MVP of Pxq
3. Obtain capacitance (mean and variance) with incrementally preconditioned GMRES
Potential Coefficient
0 1 0 1( , )ij d wP M d d w w
Solve with GMRES
Build Spectrum preconditioner
Evaluate the MVP (Pxq) with FMM in parallel
Calculate Cij with the charge distribution.
Geometric Moments
0 1 0 1( , )d wM d d w w
Incrementally update
preconditioner
Geometry Info
Process Variation
0 0( , )d w
( , )d w
Stochastic Geometric Moment
Consider two independent variation sources: panel distance (d) and panel width (w)
Multipole expansion along x-y-z coordinates: multipole moments and local moments Mi and Li show an explicit dependence on geometry parameters, and are
called geometric moments.
source-cube
observer cube
source-panel
d0
r00
0
,x y z
x y z
r r x r y r z
d d x d y d z
0 0 0 00
1 0 1 030
22 0 2 0 05
0
1( ) , ( ) 1
( ) , ( )
3 1( ) , ( ) (3 )
6
kk
k lk l kl
l d m rd
dl d m r r
d
d dl d m r r r r
d
0 0 0 00 0 00 0 0
1 ( 1) 1( ) ( ) ( ) ( )
| | !
p
p p ppp p ppp
r r M l d m rr d p d
Stochastic Potential Coefficient Expansion
Stochastic Potential Coefficient Relate geometric parameters to random variables
Let be random variable for panel width w, and be random variable for panel distance d.
Geometric moments Mp and Lp are:
Now, the potential coefficient is
w d
0 1 0 1
0 1 0 1
ˆ ( , ) ( , )
ˆ ( , ) ( , )
p w d p w d
p w d p w d
M M w w d d
L L w w d d
0 1 0 1 0 1 0 1ˆ ˆ( , ) ( , ) ( , )w d w d w dP P w w d d M w w d d
Stochastic Potential Coefficient Expansion
Stochastic Potential Coefficient Relate geometric parameters to random variables
Let be random variable for panel width w, and be random variable for panel distance d.
Geometric moments Mp and Lp are:
Now, the potential coefficient is
w d
n-order stochastic orthogonal polynomial expansion of P
Accordingly, m-th order (m = 2n + n(n − 1)) expansion of charge is:
0 1 0 1 0 1 0 1ˆ ˆ( , ) ( , ) ( , )w d w d w dP P w w d d M w w d d
0 0 1 10
ˆ( ) ( ) ( ) ( ) ( )n
n n i ii
P P P P P
0
ˆ( ) ( )m
j jj
q q
0 1 0 1
0 1 0 1
ˆ ( , ) ( , )
ˆ ( , ) ( , )
p w d p w d
p w d p w d
M M w w d d
L L w w d d
Potential Coefficient
0 1 0 1( , )ij d wP M d d w w
Solve with GMRES
Build Spectrum preconditioner
Evaluate the MVP (Pxq) with FMM in parallel
Calculate Cij with the charge distribution.
Geometric Moments
0 1 0 1( , )d wM d d w w
Incrementally update
preconditioner
Geometry Info
Process Variation
0 0( , )d w
( , )d w
Augmented System
Recap: SOP expansion leads to a large and dense system equation
(0) (1)0
(1) (0) (1)1
(1) (0)2
0
2 0
0 2 0
P P b
P P P
P P
(0) (1)0 1
ˆ( ) ( ) ( )d dP P d P d
0
( ) ( )d j j di
q
ˆ( ) ( ) , ( ) 0d d j dP q b
Parallel Fast Multipole Method--upward
Overview of Parallel fast-multipole method (FMM) group panels in cubes, and build hierarchical tree for cubes
We use 8-degree trees in implementation, but use 2-degree trees for illustration here.
A parallel FMM distributes cubes to different processors
Upward PassLevel 0
Level 1
Level 2
Level 3
• starting from bottom level, it calculates stochastic geometric moments• Update parent’s moments by summing the moments of its children—called M2M operation
M2M
• M2M operations can be performed in parallel at different nodes
Parallel Fast Multipole Method--Downwards
Downward Pass
Level 0
Level 1
Level 2
Level 3
•Sum L2L results with near-field potential for all panels at bottom level and return Pxq• At the top level, calculation of potential between two cubes—called M2L operation.• potential is further distributed down to children from their parent in parallel—called L2L operation.
M2L
L2L
• Calculate near-field potential directly in parallel
M2M
M2M, L2L are local operations, while M2L is global operation.
How to reduce communication traffic due to global operation?
Reduction of traffic between processors
Global data dependence exists in M2L operation at the top level Pre-fetch moments: distributes its moments to all cubes on its
dependency list before the calculations. As such, it can hide communication time.
Cube 1
Cube under calculation
Cube 0
Dependency ListCube 0
Cube 1
…
Cube k
…
Cube k
Flow of piCAP
1. Use spectrum pre-conditioner to accelerate convergence
2. Incrementally update the pre-conditioner for different variation.
Potential Coefficient
0 1 0 1( , )ij d wP M d d w w
Solve with GMRES
Build Spectrum preconditioner
Evaluate the MVP (Pxq) with FMM in parallel
Calculate Cij with the charge distribution.
Geometric Moments
0 1 0 1( , )d wM d d w w
Incrementally update
preconditioner
Geometry Info
Process Variation
0 0( , )d w
( , )d w
Deflated Spectral Iteration
Why need spectral preconditioner GMRES needs too many iterations to achieve convergence. Spectral preconditioner shifts the spectrum of system matrix to
improve the iteration convergence
Deflated spectral iteration k (k=1 power iteration) partial eigen-pairs
Spectrum preconditioner
Why need incremental precondition Variation can significantly change spectral distribution Building each pre-conditioner for different variations is
expensive Simultaneously considering all variations increases the
complexity of our model.
1 1[ ,..., ], [ ,..., ]K K K KV v v D diag
1( ), TK K KW I V D V shifting value
Incremental Precondition
For updated system , the update for the i-th eigen vector is:
is the subspace composed of is the updated spectrum
Updated pre-conditioner W’ is
1 0P P P ( 1,..., )iv k K
1 Ti i i i iv V B V Pv
iV 1[ ,..., ,... ] ( , , 1,..., )j Kv v v j i and i j K iB
1[ ,..., ,..., ], ( , , 1,..., )i i i j i KB diag j i and i j K
1
1 1
1 1
( ( ) ( ) )
( )
( )
( )
TK K K
K K K K K K
T T TK K K K K K K
T T TK K K K K K K
W I V D V W W
W E V D F D V
where E V D V V D V
F V V D V V D
Inverse operation only involves
diagonal matrix DK
Consider different variations by updating the nominal preconditioner partially.
Outline
Background and Motivation
Algorithm
Experimental Results
Conclusion and Future Work
Accuracy Comparison Setup: two panels with random variation for distance d and width w
Result: Stochastic Geometric Moments have high accuracy with average error of 1.8%, and can be up to ~1000X faster than MC
2 panels, d0 = 7.07μm, w0 = 1μm, d1 = 20%d0
MC (3000) piCAP
Cij (fF) -0.3113 -0.3056
Time(s) 10.786037 0.008486
2 panels, d0 = 11.31μm, w0 = 1μm, d1 = 10%d0
MC (3000) piCAP
Cij (fF) -0.3861 -0.3824
Time(s) 10.7763 0.007764
2 panels, d0 = 4.24μm, w0 = 1μm, d1 = 20%d0, w1 = 20%h0
MC (3000) piCAP
Cij (fF) -0.2498 -0.2514
Time(s) 11.17167 0.008684
Runtime for parallel FMM Setup
Two-layer example with 20 conductors.Other: 40, 80, 160 conductorsEvaluate Pxq (MVP) with 10% perturbation
on panel distance
ResultAll examples can have about 3X speedup with 4 processors
#wire 20 40 80 160
#panels 12360 10320 11040 12480
1 proc. 0.737515/1.0 0.541515/1.0 0.605635/1.0 0.96831/1.0
2 proc. 0.440821/1.7X 0.426389/1.4X 0.352113/1.7X 0.572964/1.7X
3 proc. 0.36704/2.0X 0.274881/2.0X 0.301311/2.0X 0.489045/2.0X
4 proc. 0.273408/2.7X 0.19012/2.9X 0.204606/3.0X 0.340954/2.8X
Efficiency of spectral preconditioner
Setup: Three test structures: single plate, 2x2 bus, cubic
ResultCompare diagonal precondition with spectrum preconditionSpectrum precondition accelerates convergence of GMRES (3X).
# panel # variable diagonal prec. spectral prec.
# iter Time(s) # iter Time(s)
plate 256 768 29 24.59 11 8.625
cubic 864 2592 32 49.59 11 19.394
bus 1272 3816 41 72.58 15 29.21
Speedup by Incremental Precondition
Setup Test on two-layer 20 conductor example Incremental update of nominal pre-conditioner for different
variation sources Compare with non-incremental one
discretizationw-t-l
#panel #variable Total Runtime (s)
Non-incremental incremental
3x3x7 2040 6120 419.438 81.375
3x3x15 3960 11880 3375.205 208.266
3x3x24 6120 18360 - 504.202
3x3x50 12360 37080 - 3637.391
Result: Up to 15X speedup over non-incremental results, and only incremental one can finish all large examples.
Conclusion and Future Work
Introduce stochastic geometric moments
Develop a parallel FMM to evaluate the matrix-vector product with process variation
Develop a spectral pre-conditioner incrementally to consider different variations
Future Work: extend our parallel and incremental solver to solve other IC-variation related stochastic analysis
ThanksPiCAP: A Parallel and Incremental Capacitance
Extraction Considering Stochastic Process Variation
Fang Gong, Hao Yu and Lei He