is cmos more reliable with scaling? - stanford...
TRANSCRIPT
TM Mak CMOS reliability with scaling 1
Is CMOS more reliable with scaling?
TM MakIntel Corporation
TM Mak CMOS reliability with scaling 2
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
1970 1980 1990 2000 2010
4004
8080
8086
8008
Pentium® Processor
486™ DX Processor386™ Processor
286
Pentium® II Processor
Pentium® III Processor Pentium® 4Processor
Heading toward 1 billion transistors in 2007
Itanium™ Processor (McKinley)
June 2001Nov 2001 May, 2002
Moore’s Law
Continues
Dec 2000
TM Mak CMOS reliability with scaling 3
VCC and VT Scaling• VCC is decreasing more
rapidly than VT
• Transistor drive current is function of (VCC-VT)n
020406080
100120140160180200
1995 2000 2005 2010 2015
nm
00.20.40.60.811.21.41.61.8
Vdd/Vt
Power supply voltage
Vt
Isolated Lines (MPU gate)
Source ITRS
1
10
100
1,000
10,000
30 40 50 60 70 80 90 100 110
Temp (C)
Ioff
(na/
)
0.10m0.13m0.18m0.25m
TM Mak CMOS reliability with scaling 4
Fewer and fewer atoms between the gate and channel
Placing a few SiON species uniformly in billions of devices ??
65nm nodeLgate=30nm
TM Mak CMOS reliability with scaling 5
Gate Oxide WearoutHot e-
• Gate oxide fails (leakage current increases) in characteristic time dependent on electric field and temperature.
• Gate oxide reliability impose BI / Test voltage stress and Tj limits.
• Trapped charge also shift Vtcosting performance degradation over time
Fmax Shift Distribution thru Box Stress : MiniMin Condition
99.9
99.599
95
90
80706050403020
10
5
210.5
0.1
0
%Fmax Shift
Cum
Dis
t
Series2Series3
Control
Stress
GATE
⊕⊕⊕ ⊕ ⊕
⊕⊕S
- ⊕ ⊕ ⊕
⊕-- -
-
ISUB
V BB
DEPLETION LAYEREDGE
D
EOX
VG < V D
ID VDVS
TM Mak CMOS reliability with scaling 6
Shrinking Bathtub?
FailureRate
Time
Infant Mortality (declining failure rate)Due to Latent Reliability DefectsGoals: 500 DPM within 0-30 days & 200 FIT within 0-1 year
Cumulative Fallout Vs. Time(follows a lognormal distribution)
7 YR WearoutTarget
~1 year
Impact of Burn In: Control Infant Mortality
Scope of Burn In
Wearout (increasing failure rate)Due to oxide wearout, EM, hot-e, etc Goal: <0.1% failing for intrinsic reliability mechanisms
.10um .13um .18um
<7 YR Early Wearout
TM Mak CMOS reliability with scaling 7
Leakages dominate BI power
Active Power Leakage Power
Transistor Gate Leakage
Total Power @ T=30C/VCC=1.15V
Burn-In Leakage Power @ T=100C/VCC=1.61VBI Active Power
Transistor Leakage Decap Leakage
*BI Acceleration FactorFrequency Reduction by 107
BIAFtgat=2.5-3.5 BIAFtr=4 BIAFdecap=6
20%-25%80%-75%
These percentages varies with the product!
TM Mak CMOS reliability with scaling 8
Burn-In Thermal Trends
Model Assumptions
BI Voltage = 1.4X Vcc
BI Temp = 90 or 100 C
Leakage = 3X (130nm)
% Low Vt = 20
Lo Vt Lkg = 10X Hi Vt Lkg
Will fit within the current BI envelope
Will fit within the NGBI envelope assuming depopulation and minor improvements
Will not fit into the NGBI envelope unless there are major improvements
200 400 600 800 1000
Die Size (mils)
BI P
ower
(W)
130nm
90nm120
150
180
210
90
0
30
60
CPU -3@ 100
CPU -2@100
CPU -1@ 100
100 deg
90 degModel:
180nm
Cost of BI increases or more infant mortality fails
TM Mak CMOS reliability with scaling 9
Thermal Runaway• Thermal runaway is a destructive positive feedback
condition that can occur when inadequate thermal control is combined with a silicon process technology where leakage increases exponentially with temperature
Ref: M Miller, NGBI, 2001
Test sockets can be destroyed by thermal runaway
TM Mak CMOS reliability with scaling 10
Random dopant fluctuation
• Dopant Fluctuation causes Vt variation– Substantial variation even
for close proximity– Affect device that require
symmetry• SRAM array (substantial
due to # of cells), differential sense amp, current mirrors
Ion implantation for Vt control
Dopant under gate reduces to thousands/hundreds of dopant atoms
TM Mak CMOS reliability with scaling 11
Leakage as a defect mode: data retention• Minute leakage at
storage node can cause node to change state when it should not– Fault model is a
high R bridge across source to drain
• Will not fail if cell is refreshed frequent enough– Usual test method
is “wait”
Word line
Bit line
Vss
Bit Line
Vcc
Word line
Ileak
Vg
“Off”
“Off”
TM Mak CMOS reliability with scaling 12
Latches and domino gates are equivalent storage
nodes• Failure mode may not be detected at
speed– R is not small enough to have
appreciable performance difference– Running at low frequency or left
alone will cause state loss
• Coupled with the system and chip design techniques to save power, such as stop clock, frequency changes, Vcc changes, device may have a hard time remembering its machine states
Clk
D QIleak
Vg
“Off”
“Off”
evaluate
eitherinputis "0"
keeperon keeper
off
leakage
Leakage increases with wearout!!Vt mismatches exacerbate leakage effects
TM Mak CMOS reliability with scaling 13
0
1
2
3
4
500 350 250 180 130 90
technology node (nm)
Vcc
0.001
0.01
0.1
1
Cnode/Qnode
(normalized)
desktop Vcc
mobile Vcc
Cnode
Qnode
Shrinking Process decrease charge per node
Soft error is a function of stored charge at sensitive nodes
Q=CVi.e., Cnode and Vcc
Word line
Bit line
Vss
Bit Line
Vcc
Word line
SERR sensitivejunctions
TM Mak CMOS reliability with scaling 14
Logic circuit subjected to SEU
Clk
D Q
sensitive junctions
typical D-latch
Feedback loop exists
Disturbance change logic state permanently
severity ofthese nodesis inputdependent
evaluate
eitherinputis "0"
nodes willdischarge uponion impact
node discharge uponenergy injection
• Logic is not immune to SER– All feedback nodes are susceptible
• Errors in compute elements may become SDC (silent data corruption) or at best system crash (equally undesirable)– Low end system may be OK with a
reboot; not acceptable for a server– SDC is the most vulnerable; won’t know
unless computation is repeated at another time
TM Mak CMOS reliability with scaling 15
Crosstalk (reliability?)Interconnect Geometry
0
200
400
600
800
1997 2001 2006 2009 2012
Pit
ch (n
m)
0.0
1.0
2.0
3.0
4.0
Asp
ect R
atio
Coupling vs. Substrate capacitance
0
2
4
6
8
1997 2001 2006 2009 2012
Cc/
Cs
Pitch
Aspect RatioFrequency DoublesFrequency Doublesin Two Yearsin Two Years
~30 GHz~30 GHz
Pentium III procPentium III procPentiumPentium®® ProPro
PentiumPentium ®® procproc486486
100100
1,0001,000
10,00010,000
100,000100,000
’00’00
FrequencyFrequency(MHz)(MHz)
386386286286
11
1010
’70’70 ’90’90
8086808680858085
808080808008800840044004
’80’80
0.10.1
30GHz30GHz14GHz14GHz
6.5GHz6.5GHz3 GHz3 GHz
’10’10
Pentium III procPentium III procPentiumPentium®® ProPro
PentiumPentium ®® procproc486486
100100
1,0001,000
10,00010,000
100,000100,000
’00’00
FrequencyFrequency(MHz)(MHz)
386386286286
11
1010
’70’70 ’90’90
8086808680858085
808080808008800840044004
’80’80
0.10.1
30GHz30GHz14GHz14GHz
6.5GHz6.5GHz3 GHz3 GHz
’10’10
~30 GHz~30 GHz
Frequency DoublesFrequency Doublesin Two Yearsin Two Years
Noise ∝ Cc dV/dt
TM Mak CMOS reliability with scaling 16
Crosstalk can also be a transient event
• Multiple coupling possible in meshes of wires
• Individual analysis may give a healthy report– But their combined
forces (e.g. buses) may create enough of a slowdown or significant glitch
– Not correlated signals may be very hard to analyze (timing windows, functional sensitizable)
• May appear as a random system failure– Data dependent and/or
machine state dependent
When will your neighbors all gang on you?
TM Mak CMOS reliability with scaling 17
Summary• Scaling will bring less reliable electronics unless we
come up with new solutions– Scaling bring more leakage (Igate and Ioff) – Both manufacturing issues (Burn-in and lack thereof) and
long term quality/reliability issues (field infant mortality, hot electron degradation and wearout)
– Leakage problems further exacerbated with Vt mismatches– Scaling also increase logic susceptibility to Soft Error– Signal integrity may appear as a reliability issue
• Traditional fault tolerance solution don’t work – Excessive power consumption– 2X+ cost
• Fault tolerance largely an architecture solution; does it mean less testing with adoption of Fault tolerance?