transient analysis ck cheng uc san diego ck cheng uc san diego jan. 25, 2007
TRANSCRIPT
![Page 1: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/1.jpg)
Transient Analysis Transient Analysis
CK Cheng
UC San Diego
CK Cheng
UC San Diego
Jan. 25, 2007Jan. 25, 2007
![Page 2: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/2.jpg)
Outline
• Research Directions• Simulation test case results• Overview of Simulation• Commercial Package• Alternating direction implicit (ADI) Method• General Operator Splitting Method• Distributed Computing• Conclusions and Future Works
![Page 3: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/3.jpg)
Research Directions
• Simulation: SPICE, STA
• Network on Chip: topology and wire styles,
• Power, and Clock Networks
• Data Path Components: adders, shifters, multipliers, division
• Packaging: passive distortion compensation
![Page 4: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/4.jpg)
6x6 Bump Simulation Results• The Circuit:
– 184K Capacitors, 17K Current Sources, 120K Inductors and 246K Resistors.
– 306K Nodes
• Accuracy:– Waveform and measurement results match Fujitsu’s
with less than 0.002% error.
• Runtime / Memory Comparison:
CPU_Time Memory Computer Used
UCSD 678s 600.2M Pentium 4 3.2G, Linux
Fujistu Log File 1845s 771M unknown
![Page 5: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/5.jpg)
6x6 Bump Simulation Results• Measurement results and waveform
Min_pwr_l_est_10000954 Min_18269323 Min_33085875
UCSD 0.9980790 0.9967357 0.9934251
Fujistu Log File 0.9980620 0.9966940 0.9933790
Error 0.002% 0.004% 0.005%
(Red curve is UCSD result)
![Page 6: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/6.jpg)
703KR Simulation Results• The Circuit:
– 514K Capacitors, 76K Current Sources, 370K Inductors and 703K Resistors.
– 1.3M Nodes
• Accuracy:– Measurement results match Fujitsu’s with less than
0.02% error.
• Runtime / Memory Comparison:
CPU_Time Memory Computer Used
UCSD 2575s (0.7h) 1.7G Pentium 4 3.2G, Linux
Fujistu Log File 864561s (240h) 2.28G unknown
![Page 7: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/7.jpg)
703KR Simulation Results • Measurement results and waveform
Min_33096003 Min_33096004 Min_33097557
UCSD 0.9400988 0.9421157 0.9370827
Fujistu Log File 0.9399610 0.9419260 0.9368400
Error 0.015% 0.02% 0.026%
(UCSD results only. Fujitsu waveform is not available for comparison)
![Page 8: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/8.jpg)
Further Speed-ups• Reduce iteration count by 50% for pure linear circuits (like
6x6 bump and 703KR)– 2x speed up
• More effective time step control– DVDT, breakpoint, truncation error. 1.5 - 3x speed up
• Use Multigrid solver– 1.5 - 2x speed up for medium circuits (6x6 bump)
– 2x – 10x speed up for large circuits (703KR)
• Parallel simulation– 4 or more processors on linux cluster
– 32 to hundreds of processors on supercomputer.
• Overall speed-up– 6x - 60x speed up without parallel simulation
– 12x - 1000x speed up with parallel simulation
![Page 9: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/9.jpg)
Performance and capacity prediction
Cases 10x-100x larger than 703KR.
Preferred Solver Cpu Time Memory
Small - Medium
0.3M nodes
LU Decomposition 11 minutes 600M
Medium - Large
1.3M nodes
Multigrid 43 minutes 1.7G
Huge
10–100 M nodes
Multigrid + Parallel
5 – 100 hours 15G - 200G
![Page 10: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/10.jpg)
Overview of Simulation
Our research• Fast speed with SPICE
accuracy• Nonlinear devices• Efficient matrix solvers• Effective integration methods• Time step controls according
to different integration methods
• Distributed computingYes
Load Circuit
Device Evaluation
LU Decomposition
N-R Converge?
Next Time Point
Time Step Control
Integration Approximation
Linearization
No
![Page 11: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/11.jpg)
Overview of Simulation
•Matrix Solver•LU Decomposition•Iterative Approach
•Integration•Time Step Control•ADI
•Nonlinear Devices•Two Stage Newton Raphson
•Distributed Computing•Commercial Implementation
![Page 12: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/12.jpg)
Overview of Simulation
•Integration•Time Step Control•ADI (two-way partitioning)•Operator Splitting (multi-way)
•Distributed Computing•MPI•Partitioning
•Three Ph.D. Students
![Page 13: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/13.jpg)
Commercial Package: Fastrack Design
•Founded in January 2001•Headquartered in San Jose•Privately funded, cash-flow positive•Two Business Units
•Design Services•Technology Products
![Page 14: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/14.jpg)
Analog Designs
DesignDesign # Elements# Elements Sim. Sim. LenLen
HSpiceHSpice mSPICEmSPICE SPEEDUPSPEEDUP
FACTORFACTOR
LVDS 13490 20us 80h 26h 3.1X
Oscillator 222 1 ms 13,706s 2,670s 5.1X
Biasing Circuit
49197 200ns 427s 82s 5.2X
PLL 16050 40us 67d 12d 5.6X
PLL (post-layout)
300K 40us 290d (est) 16d 18.1X
![Page 15: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/15.jpg)
Digital Blocks
DesignDesign
NameNameDevicesDevices RuntimeRuntime Speedup Speedup
FactorFactorMOSMOS RR CC mSPICEmSPICE Traditional Traditional SpiceSpice
ALU 10.1k 12.7k 7.5k 6.9m 7m 1.0X
CONTROL 69k 83.7k 52.5k 1.5h 9.5h 6.3X
YN_BLK 205K 242.8k 203.9k 3.5h > 2d >13.7X
THP 437k 499.3k 313.5k 5.0h COULD NOT RUN ∞
VCON 936k 753k 561k 15.0h COULD NOT RUN ∞
![Page 16: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/16.jpg)
Memory Blocks
DesignDesign # #
TrTr
##
RR
##
CC
# Vectors / # Vectors / Sim. LengthSim. Length
mSPICEmSPICERun TimeRun Time
BRAM (pre) 220K 0 500 2 2.5 hours
SRAM (pre)
8Kx8 SP
410K 0 0 2 7 hours
eRAM (post)
256x16
72K 28K 427K 48ns 8 hours
BRAM (post) 220K 1320K 870K 2 18 hours
• 100% accurate Spice simulation
![Page 17: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/17.jpg)
mSPICE-Parallel
• Industry’s first practical parallel Spice simulation solution
– Increases capacity further
– Dramatically improves throughput
• Uses Matrix Level Partitioning
– No loss of accuracy
– Client-Server configuration
– Minimal memory requirement for client nodes
![Page 18: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/18.jpg)
Client-Server Configuration
• Server distributes sub-matrices to clients• Clients communicate partial solutions• Minimal memory requirements for clients
1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1
1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1
0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1
1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1
1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1
1 0 1 0 0 0 0 0 0 1 0 0
1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0
0 1 0 0 0 1 0 1
![Page 19: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/19.jpg)
Experimental Results
DesignDesign TotalTotal
ElementsElements
Sim. LengthSim. Length RuntimeRuntime
1-proc1-proc 2-proc2-proc 4-proc4-proc
ASIC 1.2M 8ns 12.2h 7.0h
(1.7X)
5.1h
(2.4X)
38IO SSO 1.4M 30ns 3.0h 2.0h
(1.5X)
1.4h
(2.2X)
Signal-power 2.1M 1.2us 13d 7d18h
(1.7X)
5d12h
(2.4X)
4096x8 RAM
(extracted)
2.3M 10ns 32h 18.5h
(1.7X)
13.4h
(2.4X)
120IO SSO 3.5M 30ns 6.2h 4.1h
(1.5X)
3.1h
(2.0X)
![Page 20: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/20.jpg)
ADI: Previous Works
• 1999, Namiki and Ito
– the alternating direction implicit (ADI) is used to simulate a 2D TE wave.
• 2001, Zheng etc.
– extend to 3D problem
• 2001 & 2003, Lee and Chen
– ADI is used to transmission line modeled power grid
The alternation is among different geometric directions, so the simulated geometric structure is constrained.
![Page 21: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/21.jpg)
Alternating Direction Implicit (ADI)
• ADI Integration Method– Two way partition of the circuit– One partition is used for each backward integration
– Unconditional stable
(A-stable: independent of time step size)– Time step size according to local truncation error.
![Page 22: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/22.jpg)
Alternating Direction Implicit (ADI)
• ADI method formulation• Circuit partition algorithm• Local truncation error estimation• Stability discussion• Experimental results
![Page 23: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/23.jpg)
SPICE Formulation
• Equations for RLC circuits
where C: capacitance matrix L: inductance matrix
R: resistance matrix G: conductance matrix
E: incidence matrix
)t(U)t(I
)t(V
RE
EG
)t(I
)t(V
L0
0C T
![Page 24: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/24.jpg)
ADI Formulation
• Transient simulation
– Split the resistors and inductors branchesinto two parts
• G = G1 + G2
• E = E1 + E2
• R = R1 + R2
– Alternate Backward and Forward integrationon each partition
![Page 25: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/25.jpg)
ADI Formulation (Cont.)
• Equations of ADI method
– the size of left-hand-side matrix remains unchanged
– the number of non-zero elements is decreased
– direct solving methods can be efficient
)2
ht(U
)2
ht(I
)2
ht(V
Rh
L2E
EGh
C2
)ht(I
)ht(V
Rh
L2E
EGh
C2
)2
ht(U
)t(I
)t(V
Rh
L2E
EGh
C2
)2
ht(I
)2
ht(V
Rh
L2E
EGh
C2
11
T11
22
T22
22
T22
11
T11
![Page 26: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/26.jpg)
Experiments of non-zero fill-ins
• A small ASIC Design
Spice matrix : Dimension: 10,286 The number of non-zero elements: 46,655 The number of non-zero fill-ins: 90,960
• A large I/O Design
Spice matrix : Dimension: 615,436 The number of non-zero elements: 2,126,246
Sub-matrix1 Sub-matrix2 Total# non-zero
fill-ins# non-zeroelements
# non-zerofill-ins
# non-zeroelements
# non-zerofill-ins
Case 1 38,572 2,618 42,020 10,040 12,658
Case 2 1,176,208 12,421,534 950,038 14,772,068 27,193,602
![Page 27: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/27.jpg)
Local Truncation Error (LTE)
• Time step control using LTE– In circuit transient analysis, the next time step can be
estimated from the local truncation error at the present time point
– LTE is defined as the difference between the calculated solution and the exact solution
– To ensure the consistency, the local truncation error should not exceed the error tolerance, thus the time step can be estimated using
)tΔ(fx̂xεLTE n1n1nn
toln1n1nn E)tΔ(fx̂xεLTE
![Page 28: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/28.jpg)
Local Truncation Error (Cont.)
• LTE of ADI method(1) equations
let , , and
then
)t(U)t(I
)t(V
RE
EG
)t(I
)t(V
L0
0C T
UNXXM
)t(I
)t(VX
L0
0CM
RE
EGN
T
BUAXUMNXMX 11
![Page 29: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/29.jpg)
Local Truncation Error (Cont.)
• LTE of ADI method(2) Estimate exact solution
we characterize the input as a simple ramp over the interval (tn, tn+1), the exact analytic solution with time step tn:
]tΔ
UΔBA)UΔU(B[A]
tΔ
UΔBABU(AX[eX
n
n1nn
1
n
n1n
1n
tΔA1n
n
n3
n32
n2
n X)tΔA6
1tΔA
2
1tΔAI(
n3
n22
n U)tΔBA6
1tΔAB
2
1B(
)tΔ(OUΔ)tΔAB6
1tΔB
2
1( 4
nn2
nn
![Page 30: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/30.jpg)
Local Truncation Error (Cont.)
• LTE of ADI method(3) Estimate ADI solution
2/1n2/1n1n
1n2n
2/1nn2n
2/1n1n
UX)NMtΔ
2(X)NM
tΔ
2(
UX)NMtΔ
2(X)NM
tΔ
2(
n2n1
1n
1n1
2n
1n X)A2
tΔI()A
2
tΔI)(A
2
tΔI()A
2
tΔI(X̂
2/1nn1
2n1
1n
1n1
2n BU
2
tΔ])A
2
tΔI()A
2
tΔI)(A
2
tΔI()A
2
tΔI[(
![Page 31: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/31.jpg)
Local Truncation Error (Cont.)
• LTE of ADI method(3) Estimate ADI solution
n2n1
1n
1n1
2n
1n X)A2
tΔI()A
2
tΔI)(A
2
tΔI()A
2
tΔI(X̂
2/1nn1
2n1
1n
1n1
2n BU
2
tΔ])A
2
tΔI()A
2
tΔI)(A
2
tΔI()A
2
tΔI[(
n3
n213
n32
n2
n X)tΔAAA4
1tΔA
4
1tΔA
2
1tΔAI(
n3
n213
n22
nn U)tΔBAA4
1tΔBA
4
1tΔAB
2
1tΔB(
)tΔ(OUΔ)tΔAB4
1tΔB
2
1( 4
nn2
nn
![Page 32: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/32.jpg)
Local Truncation Error (Cont.)
• LTE of ADI method(4) LTE estimation
1n1nn X̂XεLTE
n3
n213
n3 X)tΔAAA
4
1tΔA
12
1(
)tΔ(OXtΔAA4
1XtΔ
12
1 4nn
3n21n
3n
)tΔ(OUΔtΔAB12
1U)tΔBAA
4
1tΔBA
12
1( 4
nn2
nn3
n213
n2
![Page 33: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/33.jpg)
Local Truncation Error (Cont.)
• LTE of ADI method(5) Time step control
2/1n2/1n1n
1n2n
2/1nn2n
2/1n1n
UX)NMtΔ
2(X)NM
tΔ
2(
UX)NMtΔ
2(X)NM
tΔ
2(
2/1n1n22/1n12/1n1nn
2/1nn22/1n1n2/1nn
UXNXN)XX(MtΔ
2
UXNXN)XX(MtΔ
2
![Page 34: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/34.jpg)
Local Truncation Error (Cont.)
• LTE of ADI method(5) Time step control
)XX(tΔAA4
1)XX(
2
tΔXX n1n
2n21n1n
nn1n
n3
n21n1nn
nn XtΔAA4
1)XX(
2
tΔXtΔ
)XX(2
tΔXtΔAA
4
1n1n
nn
3n21
)XX(2
tΔXtΔAA
4
11nn
1n1n
31n21
3n2
1n
1nnnn
3n21n
3n tΔ)
tΔ2
XX
12
X(XtΔAA
4
1XtΔ
12
1LTE
![Page 35: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/35.jpg)
Stability Discussion
• The stability is concerned with whether the accumulated error grows or decays as time evolves through a series of time steps.
• One-step integration approximations, the error is accumulated by a factor of
• If the final steady state error vector is smaller than the initial, then the integration method is stable.
• In ADI integration method:
– It can be proved to be unconditional stable
]tΔ
UΔBABU(AX[e]
tΔ
UΔBA)UΔU(B[AX
n
n1n
1n
tΔA
n
n1nn
11n
n
ntΔAe
)A2
tΔI()A
2
tΔI)(A
2
tΔI()A
2
tΔI(e 2
n11
n1
n12
ntΔA n
![Page 36: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/36.jpg)
Experimental Results
Circuit1 Cuicuit2 Circuit3 1k-cell
#Nodes 10,000 40,000 90,000 10,200
#Transistors 0 0 0 6,500
Period 10ns 10ns 10ns 10ns
SPICE3 CPU time (sec) 77.8 485.3 3,061.1 181.6
#steps 115 115 114 193
ADI CPU time (sec) 28.6 117.8 275.2 523.3
#steps 102 102 102 949
Speedup 2.7x 4.1x 11.1x -
![Page 37: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/37.jpg)
Voltage drop of Circuit3 (power mesh with sinks)
![Page 38: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/38.jpg)
Signal in 1k_cell (ASIC design)
![Page 39: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/39.jpg)
General Operator Splitting
• General operator splitting method– Multiple way partitions
– Each partition is considered separately in each time step simulation
– No geometry constrains
– Local truncation error is used to dynamically control time step size
![Page 40: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/40.jpg)
General Operator Splitting
• Fundamental theory• Operator splitting formulation• Local truncation error estimation• Stability discussion• Experimental results
![Page 41: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/41.jpg)
Fundamental theory
• In circuit transient simulation, the integration approximation is actually the approximation of the exponential operator
• The exponential operators can be approximated in any order using a general scheme of fractal decomposition
• The decomposition of exponential operators corresponds to the circuit multi-way partition
New integration approximation in transient simulation
![Page 42: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/42.jpg)
Fundamental theory
• Approximation of exponential operator– General circuit equation and solution
– If we characterize the input as a simple ramp over the interval (tn, tn+1), the exact analytic solution with time step tn
– Exponential operator approximation
• Forward Euler
• Backward Euler
• Trapezoidal
]tΔ
UΔBA)UΔU(B[A]
tΔ
UΔBABU(AX[eX
n
n1nn
1
n
n1n
1n
tΔA1n
n
)t(Bu)t(Ax)t(x
tΔt
t
)τtΔt(AtΔA τd)τ(Bue)t(xe)tΔt(x
1tΔA )tΔAI(e
tΔAIe tΔA
)tΔA2
1I()tΔA
2
1I(e 1tΔA
![Page 43: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/43.jpg)
Fundamental theory
• Decomposition of exponential operators(Masuo Suzuki, 1991, Physics)– Function
– First order:
– Second order:
– Third order:
– (2m-1)th and (2m)th order:
)BA(xe)x(F xBxA
1 ee)x(f xA
2
1xB
xA2
1
2 eee)x(f
)22/(1s,eeeeeee)x(f 3xA
2
ssxB
xA2
s1xB)s21(
xA2
s1sxB
xA2
s
3
)22/(1k
)xk(f)x)k21((f)xk(f)x(f)x(f1m2
m
m3m2m3m2m3m2m21m2
![Page 44: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/44.jpg)
Fundamental theory
• Decomposition of exponential operators
)()(2
1)(
)()2
1
2
1
2
1
2
1()(
)()4
1
2
1
2
1
8
1
2
1
8
1()
2
1
2
1(
)](8
1
2
1)][(
2
1)][(
8
1
2
1[
)(
)()(2
1)()(
322
3222
322222
322322322
2
1
2
1
2
322)(
xOxBAxBAI
xOxBAABBAxBAI
xOxABAABABAxABAI
xOxAAxIxOxBBxIxOxAAxI
eeexf
xOxBAxBAIexF
xAxBxA
BAx
![Page 45: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/45.jpg)
General Operator Splitting Formulation
• Transient simulation:– Apply the second order approximation
– In each time step, every partition is calculated separately and trapezoidal integration is used for every partition
– The size of left-hand-side matrix may be changed
– The number of non-zero elements is definitely decreased
– Can be easily extended to multi-way partitions
12
121
xA2
1xAxA
2
1)AA(x eeee
121qq1q21q21xA
2
1xA
2
1xA
2
1xAxA
2
1xA
2
1xA
2
1)A...AA(xxA ee...eee...eeee
![Page 46: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/46.jpg)
General Operator Splitting Formulation
• Equations
)2
ht(U
2
1
)t(I
)t(V
2
R
h
L2
2
E2
E
2
G
h
C2
)ht(I
)ht(V
2
R
h
L2
2
E2
E
2
G
h
C2
)2
ht(U
2
1
)t(I
)t(V
2
R
h
L
2
E2
E
2
G
h
C
)t(I
)t(V
2
R
h
L
2
E2
E
2
G
h
C
)2
ht(U
2
1
)t(I
)t(V
2
R
h
L2
2
E2
E
2
G
h
C2
)t(I
)t(V
2
R
h
L2
2
E2
E
2
G
h
C2
1T1
T11
1T1
T11
2T2
T22
2T2
T22
1T1
T11
1T1
T11
12
121
hA2
1hAhA
2
1)AA(h eeee
![Page 47: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/47.jpg)
Local Truncation Error (Cont.)
• LTE of general operator splitting methodEstimate solution
2/1nn1
n1n
1
n
2/1nn2
nn
2
n
2/1nn1
nn
1
n
U2
1X)
2
NM
tΔ
2(X)
2
NM
tΔ
2(
U2
1X)
2
NM
tΔ
1(X)
2
NM
tΔ
1(
U2
1X)
2
NM
tΔ
2(X)
2
NM
tΔ
2(
![Page 48: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/48.jpg)
Local Truncation Error (Cont.)
• LTE of general operator splitting methodEstimate solution
n1n1
1n
2n1
2n
1n1
1n
1n X)A4
tΔI()A
4
tΔI)(A
2
tΔI()A
2
tΔI)(A
4
tΔI()A
4
tΔI(X̂
11
n2
n12
n1
n11
n )A4
tΔI)(A
2
tΔI()A
2
tΔI)(A
4
tΔI()A
4
tΔI[(
2/1n1
1n1
2n
1n1
1n U
2
1])A
4
tΔI()A
2
tΔI)(A
4
tΔI()A
4
tΔI(
n3
n2122122
21
31
3n
32n
2n X)tΔ)AAA
4
1AA
8
1AA
8
1A
16
1(tΔA
4
1tΔA
2
1tΔAI(
n3
n1221
3n
22nn U)tΔB)AA
16
3A
32
3(tΔBA
4
1tΔAB
2
1tΔB(
)tΔ(OUΔ)tΔAB4
1tΔB
2
1( 4
nn2
nn
![Page 49: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/49.jpg)
Local Truncation Error (Cont.)
• LTE of general operator splitting methodLTE estimation
1n1nn X̂XεLTE
n3
n2122122
21
31n
3n XtΔ)AAA
4
1AA
8
1AA
8
1A
16
1(XtΔ
12
1
)tΔ(OUtΔB)AA16
3A
32
3( 4
nn3
n1221
![Page 50: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/50.jpg)
Local Truncation Error (Cont.)
• LTE of general operator splitting methodLTE estimation
2/1nn1nn1n1n
2/1nnnnn2nn
2/1nnnnn1nn
UtΔB4
1)XX(tΔA
4
1XX
UtΔB2
1)XX(tΔA
2
1XX
UtΔB4
1)XX(tΔA
4
1XX
![Page 51: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/51.jpg)
Local Truncation Error (Cont.)
• LTE of general operator splitting methodLTE estimation
)XX(2
tΔXX n1n
nn1n
n3
n2122122
21
31 XtΔ)AAA
4
1AA
8
1AA
8
1A
16
1(
n3
n1221 UtΔB)AA
16
3A
32
3(
3n2
1n
1nnn tΔ)tΔ2
XX
12
X(LTE
![Page 52: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/52.jpg)
Stability Discussion
• The trapezoidal integration method is unconditional stable for stable system.
• In our operator splitting method, trapezoidal method is used for all the sub-systems
still unconditional stable
)A4
tΔI()A
4
tΔI)(A
2
tΔI()A
2
tΔI)(A
4
tΔI()A
4
tΔI(e 1
n11
n2
n12
n1
n11
ntΔA n
)A2
tΔI()A
2
tΔI(e n1ntΔA n
12
121
xA2
1xAxA
2
1)AA(x eeee
![Page 53: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/53.jpg)
Experimental Results
Circuit1 Cuicuit2 Circuit3
#Nodes 10,000 40,000 90,000
#Transistors 0 0 0
Period 10ns 10ns 10ns
SPICE3 CPU time (sec) 77.8 485.3 3,061.1
#steps 115 115 114
GOS CPU time (sec) 164.7 1011.6 3435.9
#steps 102 102 102
Comparison 2.1x 2x 1.1x
![Page 54: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/54.jpg)
Voltage drop of Circuit3 (power mesh with sinks)
![Page 55: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/55.jpg)
Conclusions
• We investigate alternating direction implicit and general operator splitting integration methods for transistor-level circuit transient simulation.
• In both methods, the circuit will be divided into several sub-circuits, thus the direct matrix solver is still efficient because the matrix is simplified.
• Both methods are second order accurate and unconditional stable.
• Overhead:– Circuit partition– Each time step consists of many sub-steps, each sub-step is a
N-R iteration process• Better for circuits with large linear network
![Page 56: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/56.jpg)
• Distributed Processors – Cluster
– Supercomputer
– Multi-Core Processors (Intel Dual/Quad-Core, IBM Cell etc.)
• Standard– MPI
– Partitioning
– Matrix Solver
• Capabilities– Speed-up (10-100+)
– Memory Capacity (10-100+)
Distributed Computing
![Page 57: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/57.jpg)
Future Works
• ADI method– More experiments
• General operator splitting method– Design and implement multi-way circuit partition
algorithm– Implement multi-way general operator splitting program– Derive LTE for general multi-way situation– More experiments
• Distributed Computing– MPI Standard– Distributed Partitioning, Matrix Solver
![Page 58: Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007](https://reader035.vdocuments.mx/reader035/viewer/2022062305/5697bf881a28abf838c89af6/html5/thumbnails/58.jpg)