slip workshop, april 2001 ken rose a comprehensive look at system level modeling ken rose, bibiche...
TRANSCRIPT
SLIP Workshop, April 2001 Ken Rose
A Comprehensive Look at System Level Modeling
Ken Rose, Bibiche Geuskens,
Ramon Mangaser, Christopher Mark
Center for Integrated Electronics and Electronics Manufacturing
Department of Electrical, Computer and Systems Engineering
Rensselaer Polytechnic Institute
Troy, NY 12180-3590
518.276.2981
SLIP Workshop, April 2001 Ken Rose
Rensselaer Interconnect Performance Estimator
RIPE
RIPE 3.0 models are described in ‘Modeling Microprocessor Performance’by B. Geuskens and K. Rose, Kluwer, 1998.
It is available for use on line athttp://latte.cie.rpi.edu/ripe.html
RIPE was developed with partial support from IBM and SRC.
SLIP Workshop, April 2001 Ken Rose
Co-Authors:
• Bibiche Geuskens (RIPE 1.0, 2.0, 3.0) PhD. June 1997
Intel Corporation, Hillsboro, Oregon
• Ramon Mangaser (RIPE 3.1, 4.0, 4.1) PhD. Nov. 1999
• Christopher Mark (RIPE 4.2) PhD. Sep. 2000
Sun Microsystems, Chelmsford, Massachusetts
Intel Corporation, Hillsboro, Oregon
SLIP Workshop, April 2001 Ken Rose
RIPE Genesis:
• H.B. Bakoglu
‘Circuits, Interconnections, and Packaging for VLSI’
Addison-Wesley, 1990.
SUSPENS model coded in RIPE 1.0
• G. A. Sai-Halasz
Proc. IEEE, 83/1, p. 20, 1995.
Basis for RIPE 2.0
SLIP Workshop, April 2001 Ken Rose
RIPE 3.0 Inputs and Outputs
System Description
Device/TechnologyDescription
Wiring density
Interconnect RC delay
Cycle time
Power dissipation
Capacitance
Resistance
Crosstalk
Electromigration
Interconnect Description
RIPE
System/Area
Performance
Reliability
Interconnect
Device
Wireability
Power Dissipation Yield
SLIP Workshop, April 2001 Ken Rose
RIPE 3.0 Sample Benchmark
(DEC Alpha 21164)
System Parameters:Chip Area [cm2]: 2.99
Number of Transistors [M]: 9.3SRAM [KBytes]: 112Signal I/O: 294(Logic Depth: 14, 15)
Technology Parameters:
Feature Size [m]: 0.5Number of Wire Levels: 4Power Supply [V]: 3.3
Interconnect Parameters:
Pitch [m]: 1.125, 1.125, 3.0, 3.0Rint [/cmCint [pF/cm]: 2.0, 2.0, 2.0, 2.0
Data: W.J. Bowhill et al., Dig. Tech. Journal; ISSCC 1996
RIPE INPUTS
SLIP Workshop, April 2001 Ken Rose
Cycle Time Estimation Model (Ch. 7)
2%50 377.0693.0 lCRlCRlCRCCRT WWWdrLWLdrdrstageperdelay
Sai-Halasz (1995)Sakurai (1993)
SLIP Workshop, April 2001 Ken Rose
RC Interconnect Parameters (Ch. 3)
CV
CL
Interconnect Resistance (3.1)R = eff lint /A wint
2
A = Aspect Ratio
Interconnect Capacitance (3.2) C = 2(CV + CL) 2eff 0 lint wint
(1/TILD + A/Swire)TILD = Thickness of Interlevel
Dielectric
Swire = Spacing between wires
Yang (1998)
SLIP Workshop, April 2001 Ken Rose
Transistor Count and Area Models (Ch. 4)
Processor Logic, Memory, and I/O Buffers are treated separately
Transistors AreaAlpha 21164 9.3 M 299 mm2
Memory 6.7 101I/O -------- 17 Random Logic 2.6 M 181 mm2
# Transistors # GatesAverage LogicGate Size Logic Area
SLIP Workshop, April 2001 Ken Rose
Logic Wireability (Ch. 5)
R(Ng ,p) = average interconnect length in gate pitches
Based on Rent’s rule for the number of pins, Np = Kp (Ng)p
lw = long wire length = 2 (Alogic)1/2
Nw = number of long wires = [fg/(fg+1)] Nptotal
where Nptotal is the total number of pins for functional
blocks and fg is the average logic gate fanout.
SLIP Workshop, April 2001 Ken Rose
Device Parameters (Ch. 6)
We need to have values for transistor resistors and capacitors, Rdr and Cdr . These have been superseded in RIPE 4.0.
Cycle Time Estimation Model (Ch. 7)
Tcycle = (fld – 1) Tgavg + 2Tginv + time_of_flight
where fld is the logic depth
SLIP Workshop, April 2001 Ken Rose
Power Dissipation (Ch. 8)
Ptot = fd Ctot Vdd Vswing fc + Isc Vdd + Ileak Vdd
fdi Csw,i) Vdd2 fc
where fd is the activity factor.
1. random logic fd Csw,rl
2. clock distribution fd,clk Csw,clk
3. memory fd Csw,mem
4. interconnections fd Csw,int
5. off-chip drivers fd Csw,dr
For the Alpha 21164 fd,clk = 0.75, fd = 0.15 based on published details.
SLIP Workshop, April 2001 Ken Rose
RIPE 3.0 Sample Benchmark
(DEC Alpha 21164)
Memory Transistors: 6.73 M 7.2M 6.73 MArea memory: 1.01 cm2 1.02 cm2 1.01 cm2
Pad ring area: 0.16 cm2 0.17 cm2 0.16 cm2
Clock frequency: 291 MHz 300 MHz 373 MHzPower Dissipation: 52 W 50 W 66 WPower clock distribution: 21 W 20 W 27 W
ActualRIPE Results
Al/SiO2
RIPE ResultsCu/SiO2
SLIP Workshop, April 2001 Ken Rose
RIPE 3.0 Benchmark Results
Processor Chip Parameters Actual RIPE
Alpha 21164
(0.5 m
CMOS)
Clock frequency (MHz) 300 290
Power dissipation (W) 50 52
Number of metal levels 4 4
Pentium
(0.6 m
BiCMOS)
Clock frequency (MHz) 150 152
Power dissipation (W) 15-20 19
Number of metal levels 4 4
PowerPC 604
(0.5 m
static CMOS)
Clock frequency (MHz) 150 150
Power dissipation (W) 18 18
Number of metal levels 4 4
SLIP Workshop, April 2001 Ken Rose
RIPE Simulation Modes: RIPE 3.0 to RIPE 4.0
Performance Estimator
RIPE 3.0WiringStrategy
WiringStrategy
ClockFrequency
ClockFrequency,
-n and -d modes
-aw mode
Wiring Allocator
RIPE 4.0
Power,Wireability
SLIP Workshop, April 2001 Ken Rose
Intel Wiring Distribution Model
#Nets / ets B Lnets = -1.65
#Nets A (#Transistors), A 0.25
S. Yang, MRS Symposium on Advanced Interconnects, April 1998.
#Nets = [B/( + 1)] [Lmax - Lmin
Demand = [B/( + 2)] [Lmax
- Lmin
We have taken Lmax = 2 (Logic_Area)1/2
and solve the above equations for B and Lmin .
SLIP Workshop, April 2001 Ken Rose
Algorithm for RIPE 4.0 Cycle-Time Based Wiring Allocation
1. Set the input clock frequency and logic depth.2. Use RIPE’s critical path model to estimate total average
delay, including gate and wire delay.3. Determine the maximum allowable long wire delay by
subtracting the total average delay from the target cycle time.
4. Allocate wires using this maximum total long wire delay as a constraint, but allowing a maximum number of repeaters.
SLIP Workshop, April 2001 Ken Rose
Modifying the Cycle-Time Model for RIPE 4.0
Tcycle = fld Tavg + Tlong + time_of_flight
Tavg = 0.377(rint cint lint2) + 0.693{Rgout (Cgout + fg Cgin)
+ Rgout [(fg + 1)/2] cint lint + rint [(fg + 1)/2] lint Cgin}
Tlong = 0.377(rint cint llong2) + 0.693[R’gout (C’gout + C’gin)
+ R’gout cint llong + rint llong C’gin]
SLIP Workshop, April 2001 Ken Rose
RIPE 4.0 Benchmark Results
Processor Chip Parameters Actual RIPE
Alpha 21164
(0.5 m
CMOS)
Clock frequency (MHz) 300 278
Power dissipation (W) 50 57
Number of metal levels 4 4
Pentium
(0.6 m
BiCMOS)
Clock frequency (MHz) 100 113
Power dissipation (W) 15-20 19
Number of metal levels 4 4
PowerPC 604
(0.5 m
static CMOS)
Clock frequency (MHz) 133 134
Power dissipation (W) 18 20
Number of metal levels 4 4
SLIP Workshop, April 2001 Ken Rose
Katmai Wiring Strategy Calculated by RIPE 4.0
Level Pitch rint cint Lmax[x0.64m] [/cm] [pF/cm] [mm]
1 1.0 3451 2.37 0.006 2-3 1.45 891 2.61 4.4 4 2.5 365 2.40 12.3 5 4.0 158 2.34 20.5
Level Repeaters Level Wiring Total Wiringfor Lmax Efficiency Efficiency
1 0 0.02 0.02 2-3 0 0.30 0.18 4 2 0.50 0.23 5 3 0.52 0.25
SLIP Workshop, April 2001 Ken Rose
RIPE Inclusions
• BEOL Yield• Signal Integrity• Electromigration• Cache Memory Performance• Repeater Insertion• Interconnect Inductance• Accurate MOSFET Models
SLIP Workshop, April 2001 Ken Rose
BEOL Yield in RIPE
• Critical Area• Cube law distribution of defect sizes• Poisson distribution of faults
Ytotal = e-open e-short
SLIP Workshop, April 2001 Ken Rose
Katmai (250 nm Pentium III) Transition to 180nm TechnologyKatmai Shrink (Katmai-180)• number of transistors 9.5M
• chip size 1.23 0.62 cm2
• clock frequency 600 850 MHz
• metal layers 5 6
4 wiring domains
Katmai Shrink and Doubling (Katmai2)• number of transistors 19M
• chip size 1.24 cm2
• clock frequency 850 MHz
• metal layers 10
9 wiring domains
SLIP Workshop, April 2001 Ken Rose
Contributions of Different Metal Levels to Random Defect Yields for Katmai and Katmai2
Katmai250 nm
Katmai2180 nm
M1 3.9% M1 2.7 M6 3.8M2 34.8 M2 34.3 M7 2.4M3 34.8 M3 34.3 M8 1.6M4 18.8 M4 13.4 M9 1.1M5 7.7 M5 6.0 M10 0.4
TotalFaults 0.105
TotalFaults 0.464
PoissonYield 90%
PoissonYield 63%
SLIP Workshop, April 2001 Ken Rose
Signal Integrity Limits
dd
p
cint
pint
dd
p
V
Vc
c
V
V
21
2
fraction of victim wire
parallel to attacker
cintpint
cintddp C+C
C
2
1VV
Cpint
Ccint
Sakurai (1993)
SLIP Workshop, April 2001 Ken Rose
Vp Comparison between SPICE, Sakurai Model, and the Modified HP Model for Deschutes (250 nm Pentium II)
Metal
Levels
Line Lengths
(mm)
SPICE
(mV)
Sakurai
(mV)
%
Error
Modified HP
(mV)
%
Error
M2-M3 0.01 1.3 643 Big 1.25 4
6 403 643 60 436 8
M4 6 300 578 93 314 5
10 369 578 57 399 8
M5 12 321 532 66 335 4
21 382 532 39 411 8
SLIP Workshop, April 2001 Ken Rose
Cache Memory Performance
We assume that the cycle time is defined by the logic subsystem. Calculated cache access times greater than this cycle time will be flagged and reported by RIPE. RIPE will then assume that the cache requires multiple clock cycles for proper operation.
RIPE 4.1 implements the model of Wada et al. (1992) IEEE JSSC, 27, p. 1147. It can be linked to the more accurate CACTI model of Wilton and Jouppi (1996) IEEE JSSC, 31, p. 677.
SLIP Workshop, April 2001 Ken Rose
Inductance in RIPE 4.2 • RIPE has good estimates of wire capacitance (per unit length)
[Geuskens and Rose, 98, Mangaser (Ph.D. Thesis), 99]
• Estimate wire inductance from wire capacitance Assume homogeneous medium and TEM mode propagation
• Inductance analysis performed in two steps– Identification of wiring levels with significant inductance effects
Incorporate Ismail’s formulas for an inductance figure of merit (FOM) to define upper and lower bounds for wire lengths that are susceptible to inductance effects on each wiring level
Use constant RC values to estimate rise times needed in FOM
– Optimization of inductance-susceptible levels Revert to wire pitch from the last, previous wiring level without
inductance effects Given long-wire delay constraint, use Ismail’s RLC-based formulas to
determine maximum wire length (per level)
SLIP Workshop, April 2001 Ken Rose
RIPE 4.2 wire level projections using Cu/low-K(=2) • Using ITRS’99 scaling trends
• Using RPI and Bohr scaling trends with ITRS’99 clock frequencies
Technology Node (nm) 180 130 100 70 50 35RC 0 Repeaters 7 12 24 40 >50 >50
RC 5 Repeaters 6 11 22 35 >50 >50RLC 0 Repeaters 5 11 17 29 >50 >50
RLC 5 Repeaters 6 10 18 25 42 >50* 40%/60% Logic area to memory area ratio assumed
Technology Node (nm) 180 130 100 70 50 35RC 0 Repeaters 6 7 8 10 12 18
RC 5 Repeaters 6 6 7 8 10 15RLC 0 Repeaters 6 7 8 9 10 12
RLC 5 Repeaters 6 6 6 7 7 10* 20%/80% Logic area to memory area ratio assumed
• ITRS’99 scaling trends for MOSFETs, chip size and transistor counts are overly aggressive !!
SLIP Workshop, April 2001 Ken Rose
A Constant RC Input-Signal-Transition-Inherent (CRISTI) gate delay model:
Vdd
Rpu1
Rpd1 Cnode1 Cnode2 Cnode3
Rpu2
Rpd2 Rpd3
Rpu3
Constant RC model of an inverter chain
1*1*2*2*7.0 CnodeRpdCnodeRputpdr
1*1*2*2*7.0 CnodeRpuCnodeRpdtpdf
CnodeRKCnodeRtpd drdrav **7.0**
(assuming rf)
For Inverter 2
SLIP Workshop, April 2001 Ken Rose
• Resistance
5ln5090
effdr C
ttR
(1) ,
PMOS
VddVout
PMOS
Vout Ids
Vds
Ids
VdsRpu
2/02
1(2)
Previous approaches to estimating constant RC values
Gate Length (m) 1.2 0.9 0.6Rpu (K)This work 12.1 11.7 9.5Step input gate delay/Load capacitance [Weste et al.] 8.0 8.2 6.7Equation (1) [Menezes et al., Qian et al.] 6.7 6.1 5.51/(maximum drain conductance) [Sakurai] 4.1 3.9 2.91/(minimum drain conductance) [Sakurai] 667 909 167Slope-chord technique at Vgs=Vds=Vdd [Watt et al.] 15.7 16.7 13.2Equations (2) [Rabaey] 8.9 8.7 6.7Rpd (K)This work 7.7 13.0 11.4Step input gate delay/Load capacitance [Weste et al.] 6.6 9.1 8.0Equation (1) [Menezes et al., Qian et al.] 4.0 5.6 5.11/(maximum drain conductance) [Sakurai] 1.8 2.5 2.21/(minimum drain conductance) [Sakurai] 233 667 317Slope-chord technique at Vgs=Vds=Vdd [Watt et al.] 12.8 18.1 16.1Equations (2) [Rabaey] 5.6 7.9 7.1
SLIP Workshop, April 2001 Ken Rose
Two general methods of determining constant RC values
• Method 1- Given a full set of SPICE parameters, determine R and C from SPICE simulations of inverter chains
- Use actual gates, not step or ramp inputs, to drive inverters
under investigation better characterization of RC values - Use a constant RC input-signal-transition-inherent gate
delay model for inverters
• Method 2- Given limited MOSFET information, determine R and C from the “CV/I” metric
- Use this method to project RC values for deep sub-micron CMOS technologies
SLIP Workshop, April 2001 Ken Rose
C-IRSIM
• CRISTI model for inverters was extended to multi-transistor (>2) logic gates– 3-input NAND gates used initially
• Focus placed on transistors in series stacks Relative topological position and relative turn-on order
• These combined features determine the appropriate R and C value for each transistor in a series stack– Ignoring these features leads to significant errors in delay
estimation relative to SPICE
• Elmore delay terms included with RC term to account for distributed RC effects in complex gates
• CRISTI incorporated into IRSIM C-IRSIM
SLIP Workshop, April 2001 Ken Rose
Bit PatternSet #
Slowest settlingproduct bit ->
final value
AIM-Spice(ns)
C-IRSIM(ns)
IRSIM(ns)
1 P4 -> 1 2.65 3.75 3.472 P8 -> 0 4.76 4.32 6.223 P8 -> 0 6.80 7.33 8.214 P7 -> 0 4.75 3.94 3.565 P6 -> 0 4.49 5.57 5.70
Totalmultiplication
time
23.45 24.91 27.16
% error relativeto AIM-Spice
+6.2 +15.8
C-IRSIM’S% Impr.
60.6
AIM-Spice C-IRSIM IRSIMTotal execution time (s) 405 1 1
C-IRSIM simulation examples • 1056-transistor, 6-bit DADDA multiplier circuit in 0.18m technology
SLIP Workshop, April 2001 Ken Rose
Significance of good device models • Selected cycle-time components from RIPE 4.2
Technology Node (nm) 180 130 100 70 50 35Target cycle time (ns) 0.83 0.62 0.50 0.40 0.33 0.28
Ctrn (fF) 0.270 0.131 0.097 0.058 0.044 0.028Reff-trn (K) 20.5 27.3 28.3 30.3 28.2 40.8
Total logic-stagedelay (ns)
0.55 0.38 0.30 0.20 0.14 0.14
Total logic-stagedelay fraction oftarget cycle time
0.66 0.61 0.60 0.50 0.42 0.50ITRS '99
Total long-wiredelay (ns)
0.28 0.24 0.20 0.20 0.19 0.14
Ctrn (fF) 0.152 0.071 0.046 0.025 0.014 0.007Reff-trn (K) 27.8 34.8 41.1 51.4 65.6 88.8
Total logic-stagedelay (ns)
0.50 0.33 0.27 0.21 0.18 0.15
Total logic-stagedelay fraction oftarget cycle time
0.60 0.53 0.54 0.53 0.54 0.54RPI/Bohr
(ITRS ’99 clockfrequencies)
Total long-wiredelay (ns)
0.33 0.29 0.23 0.19 0.15 0.13
• Fraction of cycle time consumed by total logic delay can be relatively large (0.5-0.66) !! Devices cannot be neglected altogether
• Small change in device delay potentially big change in total wiring levels
SLIP Workshop, April 2001 Ken Rose
Conclusions
• Reasonable estimates can be made of microprocessor performance on the basis of limited information.
• Models should be robust with a limited number of arbitrary fitting parameters.
• Interconnect limitations constrain design and manufacture.
SLIP Workshop, April 2001 Ken Rose
RIPE 4.0 Sample BenchmarkIntel’s Deschutes (Pentium II) processor
RIPE INPUTSSystem Parameters Technology Parameters Wire Parameters
Circuit Area (mm2): 1.31 Technology Generation Pitch (m): 0.64, 0.93
Number of Transistors (m): 0.25 0.93, 1.60, 2.56
(M): 7.5 LGATE(m): 0.18 rint (/cm): 3451, 891,
SRAM cells (m2): 10.26 Num. of wire levels: 5 891, 365, 158
SRAM (Kbytes): 32 (Aluminum) cint (pF/cm): 2.4, 2.6,
Signal I/O: 242 Core Supply (V): 1.8 2.6, 2.4, 2.3
RIPE RESULTS ACTUAL
Clock Frequency (MHz) 459 450
Power Dissipation (W) 18.7 18.9
SLIP Workshop, April 2001 Ken Rose
Wiring strategy results from RIPE 4.1 for a 100nm, Cu/low-K(=2) technology using RPI/Bohr/ITRS’99 scaling
• No inductance analysis
• Repeaters chosen to maximize chip wireability Level # Pitch Repeaters Rw Lmax ew total_ew (xPmin) (for Lmax) (ohm/cm) (um)
------- ------- ---------- -------- ---- ---- --------
1 1.00 0 11000 2 0.0136 0.0136 2 2.00 0 2750 2774 0.3000 0.1568 3 2.00 0 2750 2774 0.3000 0.1568 4 3.00 2 1222 5938 0.2999 0.1772 5 5.00 1 440 8641 0.2999 0.1869 6 7.00 1 224 10992 0.2999 0.1929 7 8.00 3 172 13360 0.2999 0.1977 8 10.00 3 110 15477 0.3000 0.2012 9 13.00 1 65 17246 0.3000 0.2038 10 15.00 2 49 18880 0.3000 0.2059 11 17.00 2 38 20403 0.2999 0.2077 12 20.00 1 28 21759 0.2999 0.2091 13 23.00 1 21 22984 0.3000 0.2104 14 26.00 1 16 24105 0.2999 0.2114 15 29.00 1 13 25139 0.3000 0.2124 16 32.00 1 11 26800 0.2997 0.2132
SLIP Workshop, April 2001 Ken Rose
Wiring strategy results from RIPE 4.2 for a 100nm, Cu/low-K(=2) technology using RPI/Bohr/ITRS’99 scaling • Inductance analysis performed• Repeaters again chosen to maximize chip wireability
– Compromise between maximizing chip wireability and minimizing RLC delay
Level # Pitch Repeaters Rw Lmax ew total_ew (xPmin) (for Lmax) (ohm/cm) (um)
------- ------- ---------- -------- ---- ---- -------- 1 1.00 0 11000 2 0.0136 0.0136 2 2.00 0 2750 2774 0.3000 0.1568 3 2.00 0 2750 2774 0.3000 0.1568 4 3.00 2 1222 5938 0.2999 0.1772 5 5.00 1 440 8641 0.2999 0.1869 6 7.00 1 224 10992 0.2999 0.1929 7 7.00 3 224 16863 0.3000 0.2033 8 7.00 3 224 16863 0.3000 0.2033 9 8.00 1 172 19964 0.2999 0.2072 10 10.00 2 110 25657 0.3000 0.2128 11 10.00 2 110 25657 0.3000 0.2128 12 32.00 1 11 26800 0.1372 0.2121
• Wire inductance reduces the effect of wire resistance Smaller wire pitches but longer wire lengths Reduction in total number of wire levels