pattern based method for p/g grid analysis · pattern based method • drawback of traditional...
Post on 17-Oct-2020
4 Views
Preview:
TRANSCRIPT
Pattern Based Method For P/G Grid Analysis
Jin Shi, Yici Cai, Xianlong Hong Sheldon X.-D. Tan
EDA Lab. CS Department, Tsinghua University
EE Department, University of California Riverside
2006-3-31 2
Outline• Practical Computation Problems• Overview of Existing Methods• Acceleration Tech• Some Observations• Pattern Based Method• Conclusion
2006-3-31 3
Outline• Practical Computation Problems• Overview of Existing Methods• Acceleration Tech• Some Observations• Pattern Based Method• Conclusion
2006-3-31 4
Practical Problems #1• Linear computation complexity does not mean linear
increasing time cost due to Cache Miss
3( )O n
-1
0
1
23
4
5
6
0 1 2 3 4 5
log Problem Size
log
cycl
es/f
lop
T = N4.7
Size 2000 took 5 days
12000 would take1095 years
matrix multiply operation algorithm looks like 5( )O n
2006-3-31 5
Practical Problems #1• Summary of Cache Miss
– Linear computation complexity is not enough
– Optimize algorithms together with cache performance
2006-3-31 6
Practical Problems #2• Iterative efficiency
– Preconditioner’s performance decreases as the matrix size increases
1 2 3 4 5 6 7 8 9 10 110
50
100
150
200
250
300
350
400
2 log(size)
itera
tion
times
Performance of Different Preconditioner
2006-3-31 7
Practical Problems #3• Memory efficiency limitation• When the Design is too large …
– Hard to load it from DB– Impossible to build matrix– Vector malloc run out of memory– Too many years to get results
2006-3-31 8
Our Contribution
• Alleviate Cache Miss • Reduce the overall memory usage• Present more efficient preconditioner under a
partition framework• Constant preconditioner performance
2006-3-31 9
Outline• Practical Computation Problems• Overview of Existing Methods• Acceleration Tech• Some Observations• Pattern Based Method• Conclusion
2006-3-31 10
Summary of Existing Method• Computation Complexity <
– LU – PCG p slightly larger than 1– M-G with small coefficient– R-W with large coefficient– ADI with small coefficient
• Memory Efficiency– LU – PCG – M-G – R-W – ADI
• Trade off between speed and accuracy
2( )O n
( )pO n
2( )O n
( )O n( )O n
( )O n
( )O n
2( )O n
( )O n( )O n
( )O n
2006-3-31 11
Summary• Direct Methods vs Iterative Methods
– Time cost of LU:
– Time cost of PCG:
– Which is faster depends on the fill in ratio and performance of preconditioner
1 11 2
11
( ( )) ( ( ) ( ) )
( ) (1 ) ( ) :
t d A N t L v t U v
t L v f nnz A f fill in ratio
− −
−
+ × +
∝ + ⋅
( ( ) ) ( ( ) )( ) ( )
t d A N n t A vt A v n n z A
+ × ⋅∝
2006-3-31 12
Outline• Practical Computation Problems• Overview of Existing Methods• Acceleration Tech• Some Observations• Pattern Based Method• Conclusion
2006-3-31 13
Acceleration Tech• Model Order Reduction
– S domain Based before 2000– Electrical Equivalent Circuit Based 2002– Topological Partition Based 2004
• Among these methods, topological partition method is the most powerful one to simulate large P/G grid
2006-3-31 14
Partition Benefits• For direct method
– Decrease decomposition time
– Decrease fill in ratio – Decrease Cache Miss
• For iterative method– Decrease preconditioner
construction time– Increase preconditioner
performance– Decrease iteration time– Decrease Cache Miss
2 2( )n x n x⋅ <
2006-3-31 15
Outline• Practical Computation Problems• Overview of Existing Methods• Acceleration Tech• Some Observations• Pattern Based Method• Conclusion
2006-3-31 16
Observation #1• Too many elements share the same value
– Extract R L C with BEM Solver– All most all elements are extracted from M1 and M2– More than 80% elements in M1 and M2 have the same value
2006-3-31 17
Observation #2• P/G grid topology is
self-similar– One layer contains many
routed metal in the same direction
– Metal rails share the same width and pitch
• Possible to transform topology similarity to matrix similarity ?– Yes
• How about irregular P/G grid ?– Do Local regularization to
make elements in local area share the same value
• Continuous in topology domain– Local regularization will not
introduce obvious computation errors
2006-3-31 18
Observation #3• When matrix size is medium, PCG method is
faster than any other method– PCG can converge within 10 times iteration– LU usually has fill in factor larger than 30– R-W is inefficient for small case– M-G needs auxiliary computation structure and time to
construct them• Traditional preconditioner can be improved
– Use topology similarity property– Use element value similarity property
2006-3-31 19
Outline• Practical Computation Problems• Overview of Existing Methods• Acceleration Tech• Some Observations• Pattern Based Method• Conclusion
2006-3-31 20
Pattern Based Method• Regular P/G Grid in 3D
2006-3-31 21
Pattern Base Method• Self Similarity: Global Similarity (Global Pattern)
2006-3-31 22
Pattern Base Method• Self Similarity: Local Similarity (Local Pattern)
2006-3-31 23
Pattern Base Method• Similarity Summary
– Patterns are elements similar to each other– Patterns exist not only in global area but also in local
area• Strategy
– Partition global area to blocks, perform relaxation iteration between global blocks
– Reuse local pattern to perform local simulation
2006-3-31 24
Pattern Based Method• Local Matrix Generation
– Element fill in under NA method – Node order is important
g
g g g
g g
⎡ ⎤⎢ ⎥−⎢ ⎥⎢ ⎥⎢ ⎥−⎢ ⎥⎢ ⎥⎣ ⎦
O L L L L
L L L
L L O L L
L L L
L L L L O
2006-3-31 25
Pattern Based Method• Local Matrix Generation
– Ordering node number according to the routing direction of each layer
2006-3-31 26
Pattern Based Method• Diagonal Block
– transform topology similarity to matrix similarity– one tridiagonal matrix need to be stored
2006-3-31 27
Pattern Based Method• Via Block
– All the vias share the same value between two adjacent layers
– Fill in of via element is regular : size , pitch– No need to store sub matrixes caused by vias
2006-3-31 28
Pattern Based Method• Drawback of traditional preconditioner
– Performance of ILU or incomplete choleskeydecomposition are restricted to memory usage
– More fill in cause faster convergence speed
1748.411e-44410.961e-3932.791e-22211.51e-1
Avg Iter timesFill in ratioDropthreshold
No odering residual=1e-6
2006-3-31 29
Pattern Based Method• Preconditioner Generation: Two Metal Layer Case
• Schur-alike Decomposition to get approximate inverse
mod( , )
0ij via
ijTij
a b j i c gA CA a B b C c
C B other ca b
α β⎡ ⎤ ⎡ ⎤ = =⎧⎡ ⎤ ⎪⎢ ⎥ ⎢ ⎥= = = ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ =⎣ ⎦ ⎪⎩⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦
1
1
1
1 1 1
11
'
'
'
T T
T
T T
A C I A I A CC B C A I B I
B B C A C
A C II A C AC B C A II B
−
−
−
− − −
−−
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤= ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦= −
⎡ ⎤ ⎡ ⎤−⎡ ⎤ ⎡ ⎤= ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥−⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦
2006-3-31 30
Pattern Based Method• Preconditioner Generation
– Perform Precondition in CG Algorithm1 1 1
11
1 1 11 11
1 12 1 1 2 21
1 12
2 1 1 22 112
3
'
1:
2:''
3:
T T
T T T
A C II A C AP r r r
C B C A II B
Ir r rstep r r p A r
C A Ir C A r r C p r
pr Astep r r
B rr B
step r
− − −
−−
−− −
−
−−
⎡ ⎤ ⎡ ⎤−⎡ ⎤ ⎡ ⎤⋅ = = ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥−⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤= = = = =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥− − + − +⎣ ⎦⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎡ ⎤ ⎡ ⎤ ⎡ ⎤
= = =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦⎣ ⎦⎣ ⎦
1 1 212 2
2 22
r A CrI A Cr
rI
−− ⎡ ⎤⎡ ⎤ −−= = ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦
2006-3-31 31
Pattern Based Method• Preconditioner Generation
– Inverse of diagonal block
– Other matrix vector dot operation1 2
2 1T T
A C A r C rr
C B B r C r
aA r a r
a
⎡ ⎤+⎡ ⎤= ⎢ ⎥⎢ ⎥ +⎣ ⎦ ⎣ ⎦
⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦
1
1 1
1
aA r a r
a
−
− −
−
⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦
2006-3-31 32
Pattern Based Method• Benefit of Using Pattern Structure
– Memory cost is reduced from to less than
– Reuse of sub matrixes and vectors can reduce Cache Miss dramatically
– Better preconditioner can be constructed to accelerate convergence speed
0.5( )O n( )O n
2006-3-31 33
Pattern Based Method• Algorithm Flow
2006-3-31 34
Pattern Based Method• Experimental Performance: Speed
2006-3-31 35
Pattern Based Method• Experimental Performance: Memory
2006-3-31 36
Conclusion• New pattern based method is presented• Combine advantages of direct method and
iterative method• Decrease the Cache Miss dramatically • Reduce the memory efficiency dramatically• New preconditioner is faster than traditional
preconditioner• Fast linear PCG can be achieved even the grid
size is huge
2006-3-31 37
Thank you !
top related