tbs: fast analysis of structured power grid by triangularization based structure preserving model...
Post on 22-Dec-2015
221 views
TRANSCRIPT
TBS: Fast Analysis of Structured Power Grid by Triangularization Based Structure Preserving
Model Order Reduction
Hao Yu, Yiyu Shi and Lei He Electrical Engineering Dept.UCLA
Partially supported by NSFPartially supported by NSFand UC-MICRO fund sponsored by Analog Devices, Intel and Mindspeed and UC-MICRO fund sponsored by Analog Devices, Intel and Mindspeed
http//:eda.ee.ucla.edu
2New Challenges in Integrity Verification
Integrity verification is to check transient V/T-violation for linear power/signal/thermal network Large-scale
millions of nodes and ports Often structured
e.g., locally regular and globally irregular P/G network [Singh-Sapatnekar:TCAD’05]
A fast yet accurate linear simulator to perform large-scale transient verification is necessary Linear-network macromodeling is one effective approach
How to use structure information to buildaccurate and efficient macromodels
3Existing Structured Macromodeling
Hierarchical node-elimination (HNE) by [Zhao-Panda-Sapatnekar-Blaauw:DAC’00] Build macromodel by internal node elimination with source mapping Analyze macromodel in a hierarchical (two-level) fashion Require a sparsification by linear-programming (LP) due to the dense fill-in
SPRIM [Freund:ICCAD’04] and BSMOR [Yu-He-Tan:BMAS’05] Leverage block structure in the state matrix Build macromodel by a structure-preserved moment-matching
HiPRIME [Cao-Lee-Chen: DAC’02], a hierarchical extension of PRIMA [Odabaisoglu-Celik-Pileggi:TCAD’98] Build macromodel by hierarchical orthonormalization Lose the hierarchy due to the final flat-projection
We propose a new structure-preserved moment matching,with 20x less waveform error and 50x speedup
4Outline
Review macromodeling by moment matching
Our Approach: TBS method
Experimental Results
Conclusions
5Macromodeling by Moment Matching (I)
Electric systems can be described in MNA (modified nodal analysis)
Solution (x) of MNA is contained in block Krylov subspace
Grimme’s Projection Theorem
6Macromodeling by Moment Matching (II)
a) To remove linear dependency in the low-dimensioned projection matrix V, block-Arnoldi orthnormalization is applied
c) To handle large number of inputs such as P/G network, SIMO (single-input-multi-output) reduction can be assumed
b) To preserve passivity, a congruence transformation is used to project state matrices (G,C,B,L) respectively
Replace the input port matrix B by a common input vector J
All poles are matched w.r.t. one superposed input Matched moments/poles (q) are independent on input number (p)
V is flat and destroys the structure of state matrices [Feldmann-Liu: ICCAD’04Feldmann-Liu: ICCAD’04]
7Structure-preserved Moment Matching
Limitations of SPRIM and BSMOR Moment/pole matching is not localized Reduction does not preserve the structure of latency Model does not leverage redundancy Inefficient and inaccurate for P/G grid macromodeling
SPRIM [Freund:ICCAD’04] leverages the 2 x 2 block structure in MNA Splits V into a 2 x 2 block diagonal form Preserves the structure of reciprocity (symmetry between input
and output), and hence achieves a higher accuracy than PRIMA
BSMOR [Yu-He-Tan:BMAS’05] partitions state matrices into more blocks Splits V into a m x m block diagonal form Preserves the block structure and sparsity, and hence achieves better
efficiency than SPRIM
8Outline
Review macromodeling by moment matching
Our Approach: TBS method Triangular Block Structured moment matching
Experimental Results
Conclusions
9
Stamp interconnection blocks off-diagonally
Stamp basic blocks diagonally
From Layout to Structured Model
Build a structured state matrix by partitioning the layout
1 2 3 4 5 6 7 8
1 2
3 4
5 6
7 8
2g -g -g
-g1 g3 -g -gx
-g1 2g1 - g1
-g -g1 g3 -g
-g
-g
g4 -g -g
-g2 2g2 -g
-g2 g4 -g
-g -g
1 2
3 4
5 6
7 8
2g1 -g1 -g1
- -g1 -
- -
-g1 - -gx
-gx
-gx
-g2 -g2
- -g2
- -g2
-g2 -g2 2g2
1 2
3 4
5 6
7 8
1 2
3 4
5 6
7 8
w1 w2
g3=2g1+gx g4=2g2+gx
A number of interconnected basic blocks can beused to represent both homogenous and heterogeneous circuits
g1 g2
gx
10Properties of Interconnected Basic Blocks
Structure of latency: the spatial distribution oftime constants Each basic block has a time constant
Due to redundancy, basic block representation is not compact
Redundancy: different basic blocks can share a same or similar time constant
11
Dominant-pole Based Clustering removes redundancy
TBS Flow
(Reduced Blocks)
(Basic Blocks)
Block Diagonal Projection
(Block Integrity)Two-level Relaxation Analysis
(Triangular Blocks)Triangularization
(Compact Blocks) Dominant-pole Clustering
12Clustering Procedure
Compress basic blocks into compact blocks
Cluster number is determined by the nature of the network structure There is no need to cluster a homogeneous circuit,
but TBS still applies
2. Cluster basic blocks if the mode-distance is small enough
1. Calculate the q-dominant pole-set (mode) for each basic block
and
13Advantages of Clustering
Redundant poles are removed Hence redundant columns in the projection
matrix are also removed, i.e., the effective rank of projection matrix is improved
Structure of latency is leveraged Each compact block can be solved with different
time-step
A complete modal decomposition is achieved Each compact block has a unique pole-set or mode,
and the resulted system is block-wisely stiff
System poles are determined by both diagonal and off-diagonal blocks, which is not efficient
14TBS Flow
Triangularization can localize system poles to diagonal blocks, which is the key contribution of this work
(Reduced Blocks)
(Basic Blocks)
Block Diagonal Projection
(Block Integrity)Two-level Relaxation Analysis
(Triangular Blocks)Triangularization
(Compact Blocks) Dominant-pole Clustering
15Triangularization Procedure
2. Move the original lower-triangular parts to the new upper-triangular parts
1. Stack a replica-block diagonally
This procedure is implemented by a block matrix data structure without increasing memory usage
16Advantages of Triangularization
System poles are determined only by those compact blocks in diagonal Compact blocks are almost decoupled from
each other
A triangular system has a factorization cost only coming from those diagonal blocks There is no need to factorize the entire matrix
Block duplication results in an equivalent solution Simpler than the existing permutation
based triangularization procedure[Kim Davis: KLU]
Due to the replica block, the overall cost of factorization is the same as the original
17TBS Flow
Block diagonal projection can reduce the system size and the cost of the factorization
(Reduced Blocks)
(Basic Blocks)
Block Diagonal Projection
(Block Integrity)Two-level Relaxation Analysis
(Triangular Blocks)Triangularization
(Compact Blocks) Dominant-pole Clustering
18
2. Reduce the state matrices block by block respectively
Block Diagonal Projection Procedure
1. Split a flat into a structured with an increased rank by a factor of cluster number
The reduced system preserves upper-triangular structure
19Advantages of Block Diagonal Projection
System moments and poles are matched locally Each compact block is reduced locally to match q poles Total mq poles are matched for m unique compact blocks
(poles from the replica are duplicate poles)
Reduced model preserves block triangular structure and structure of latency Each reduced block can be factorized independently Each reduced block could have different time-constant
More matched poles improves accuracy Using a low-order reduction for each compact block locally can
achieve a high-order accuracy for the overall system
It can be efficiently solved by a block backward-substitution or a two-level analysis with relaxation
20TBS Flow
Two-level relaxation can further reduce simulation cost
Reduced Blocks
Basic Blocks
Block Diagonal Projection
Block Integrity
Two-level Relaxation Analysis
Triangular BlocksTriangularization
Compact Blocks Dominant-pole Clustering
21Two-level Relaxation Solver
The time-domain iteration of a triangular system always converges [White: Book’87]
Two-level representation and analysis
+
Each reduced diagonal block can be factorized independently, and solved with different time step during backward-Euler (BE) integration In contrast, the previous pole-residue solution
eigen-decompose the entire reduced matrix (dense and no structure) structure of latency cannot be explored
22Outline
Review macromodeling by moment matching
Our Approach: TBS method
Experimental Results
Conclusions
23Experiment Settings
Large-scale homogeneous and heterogeneous P/G grid (RC-mesh) with millions of nodes
For heterogeneous case, each block has different wire-pitch/width, block-size and hence different time-constant
Reduction algorithm assumes SIMO reduction for large number of inputs but also supports the general MIMO reduction
Compare TBS to BSMOR [Yu-He-Tan:BMAS’05], HiPRIME [Cao-Lee-Chen:DAC’02], and HNE[Zhao-Panda-Sapatnekar-Blaauw:DAC’00]
24Triangular Block Structure Preservation
Nonzero (nz) pattern of conductance matrices (a) original system (b) triangular system (c) reduced system by TBS
25m x q Pole Matching
(m0=32, m=4, q=8): TBS has exact 32-pole matched, BSMOR has exact 8-pole matched and 24-pole approximately matched, and HiPRIME (a partitioned PRIMA) has only 8-pole matched
Waveforms in time domain: improved accuracy with more matched poles
26Study Waveform-error Scalability
ckt Node (N) Port (p) Order (q) HNE HiPRIME BSMOR TBS
ckt1 1K 48 8 5.54e-6 9.09e-6 4.87e-6 5.03e-7
ckt2 10K 320 40 1.21e-5 2.31e-5 7.93e-6 1.84e-6
ckt3 100K 480 60 1.31e-2 6.82e-4 1.91e-4 3.02e-5
ckt4 1M 800 100 6.01e-2 9.67e-3 4.23e-3 1.27e-4
ckt5 7.68M 4800 200 0.11 9,93e-2 5.10e-2 3.01e-3
ckt6 7.68M 6.14M 300 NA NA NA 5.04e-3
HiPRIME, BSMOR and TBS use the same order (moments) to generate the macromodel
The macromodel obtained by HNE has a similar size and sparsity as TBS
1. TBS reduces waveform-error by 38X compared to HNE as truncation used in HNE leads to large error
2. TBS reduces waveform-error by 33X compared to HiPRIME as more poles are matched
3. TBS reduces waveform-error by 17X compared to BSMOR as more poles are exactly matched
27Study Runtime Scalability
1day:1hr:29min
6min:16sNANANANANANAckt6
1day:18min2min:8s1day:1hr:36min
1hr:45min
~5day2min:42s1day:5hr:11min
4hr:43min:18sckt5
11min:23s20.7s11min:42s4min:54s
~1day47.3s21min:32s34min:58sckt4
1min:32s1.62s1min:38s1min:2s2hr:48min:20s
5.76s1min:51s1min:17sckt3
1.02s0.11s1.18s0.63s1min:42s0.54s1.24s2.19sckt2
0.08s0.09s0.08s0.12s1.02s0.15s0.08s0.44sckt1
simulationbuildsimulationbuildsimulationbuildsimulationbuild
TBSBSMORHiPRIMEHNEckt
All methods generate macromodels with similar accuracy
1. TBS (and HiPRIME) is 133X faster to build than HNE as no LP-truncation is needed to preserve sparsity
2. TBS (and HiPRIME) is 54X faster to build than BSMOR as the orthonormalization is performed locally
3. TBS (and BSMOR/HNE) is 109X faster to simulate than HiPRIME as their macromodels have hierarchy
Runtime includes macromodel-building/simulation time
28Conclusions TBS enables localized moment matching, and matches more poles than
PRIMA TBS is stable, and is passive for MIMO reduction TBS is applicable to both homogenous and heterogeneous designs TBS achieves over 20x less waveform error and 50x speedup compared
to HNE, HiPRIME, and BSMOR (an improved version of SPRIM)
TBS approach has been extended to Handle inductance and its inverse element [Yu-Shi-He:ICCAD’06] Optimize simultaneous power and thermal integrity in 3D integration
[Yu-Ho-He:ICCAD’06]
More details can be found in DAC Ph. D forum 2006