is cmos more reliable with scaling? - stanford...

17
TM Mak CMOS reliability with scaling 1 Is CMOS more reliable with scaling? TM Mak Intel Corporation

Upload: vuongngoc

Post on 19-Oct-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 1

Is CMOS more reliable with scaling?

TM MakIntel Corporation

Page 2: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 2

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

1970 1980 1990 2000 2010

4004

8080

8086

8008

Pentium® Processor

486™ DX Processor386™ Processor

286

Pentium® II Processor

Pentium® III Processor Pentium® 4Processor

Heading toward 1 billion transistors in 2007

Itanium™ Processor (McKinley)

June 2001Nov 2001 May, 2002

Moore’s Law

Continues

Dec 2000

Page 3: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 3

VCC and VT Scaling• VCC is decreasing more

rapidly than VT

• Transistor drive current is function of (VCC-VT)n

020406080

100120140160180200

1995 2000 2005 2010 2015

nm

00.20.40.60.811.21.41.61.8

Vdd/Vt

Power supply voltage

Vt

Isolated Lines (MPU gate)

Source ITRS

1

10

100

1,000

10,000

30 40 50 60 70 80 90 100 110

Temp (C)

Ioff

(na/

)

0.10m0.13m0.18m0.25m

Page 4: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 4

Fewer and fewer atoms between the gate and channel

Placing a few SiON species uniformly in billions of devices ??

65nm nodeLgate=30nm

Page 5: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 5

Gate Oxide WearoutHot e-

• Gate oxide fails (leakage current increases) in characteristic time dependent on electric field and temperature.

• Gate oxide reliability impose BI / Test voltage stress and Tj limits.

• Trapped charge also shift Vtcosting performance degradation over time

Fmax Shift Distribution thru Box Stress : MiniMin Condition

99.9

99.599

95

90

80706050403020

10

5

210.5

0.1

0

%Fmax Shift

Cum

Dis

t

Series2Series3

Control

Stress

GATE

⊕⊕⊕ ⊕ ⊕

⊕⊕S

- ⊕ ⊕ ⊕

⊕-- -

-

ISUB

V BB

DEPLETION LAYEREDGE

D

EOX

VG < V D

ID VDVS

Page 6: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 6

Shrinking Bathtub?

FailureRate

Time

Infant Mortality (declining failure rate)Due to Latent Reliability DefectsGoals: 500 DPM within 0-30 days & 200 FIT within 0-1 year

Cumulative Fallout Vs. Time(follows a lognormal distribution)

7 YR WearoutTarget

~1 year

Impact of Burn In: Control Infant Mortality

Scope of Burn In

Wearout (increasing failure rate)Due to oxide wearout, EM, hot-e, etc Goal: <0.1% failing for intrinsic reliability mechanisms

.10um .13um .18um

<7 YR Early Wearout

Page 7: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 7

Leakages dominate BI power

Active Power Leakage Power

Transistor Gate Leakage

Total Power @ T=30C/VCC=1.15V

Burn-In Leakage Power @ T=100C/VCC=1.61VBI Active Power

Transistor Leakage Decap Leakage

*BI Acceleration FactorFrequency Reduction by 107

BIAFtgat=2.5-3.5 BIAFtr=4 BIAFdecap=6

20%-25%80%-75%

These percentages varies with the product!

Page 8: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 8

Burn-In Thermal Trends

Model Assumptions

BI Voltage = 1.4X Vcc

BI Temp = 90 or 100 C

Leakage = 3X (130nm)

% Low Vt = 20

Lo Vt Lkg = 10X Hi Vt Lkg

Will fit within the current BI envelope

Will fit within the NGBI envelope assuming depopulation and minor improvements

Will not fit into the NGBI envelope unless there are major improvements

200 400 600 800 1000

Die Size (mils)

BI P

ower

(W)

130nm

90nm120

150

180

210

90

0

30

60

CPU -3@ 100

CPU -2@100

CPU -1@ 100

100 deg

90 degModel:

180nm

Cost of BI increases or more infant mortality fails

Page 9: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 9

Thermal Runaway• Thermal runaway is a destructive positive feedback

condition that can occur when inadequate thermal control is combined with a silicon process technology where leakage increases exponentially with temperature

Ref: M Miller, NGBI, 2001

Test sockets can be destroyed by thermal runaway

Page 10: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 10

Random dopant fluctuation

• Dopant Fluctuation causes Vt variation– Substantial variation even

for close proximity– Affect device that require

symmetry• SRAM array (substantial

due to # of cells), differential sense amp, current mirrors

Ion implantation for Vt control

Dopant under gate reduces to thousands/hundreds of dopant atoms

Page 11: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 11

Leakage as a defect mode: data retention• Minute leakage at

storage node can cause node to change state when it should not– Fault model is a

high R bridge across source to drain

• Will not fail if cell is refreshed frequent enough– Usual test method

is “wait”

Word line

Bit line

Vss

Bit Line

Vcc

Word line

Ileak

Vg

“Off”

“Off”

Page 12: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 12

Latches and domino gates are equivalent storage

nodes• Failure mode may not be detected at

speed– R is not small enough to have

appreciable performance difference– Running at low frequency or left

alone will cause state loss

• Coupled with the system and chip design techniques to save power, such as stop clock, frequency changes, Vcc changes, device may have a hard time remembering its machine states

Clk

D QIleak

Vg

“Off”

“Off”

evaluate

eitherinputis "0"

keeperon keeper

off

leakage

Leakage increases with wearout!!Vt mismatches exacerbate leakage effects

Page 13: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 13

0

1

2

3

4

500 350 250 180 130 90

technology node (nm)

Vcc

0.001

0.01

0.1

1

Cnode/Qnode

(normalized)

desktop Vcc

mobile Vcc

Cnode

Qnode

Shrinking Process decrease charge per node

Soft error is a function of stored charge at sensitive nodes

Q=CVi.e., Cnode and Vcc

Word line

Bit line

Vss

Bit Line

Vcc

Word line

SERR sensitivejunctions

Page 14: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 14

Logic circuit subjected to SEU

Clk

D Q

sensitive junctions

typical D-latch

Feedback loop exists

Disturbance change logic state permanently

severity ofthese nodesis inputdependent

evaluate

eitherinputis "0"

nodes willdischarge uponion impact

node discharge uponenergy injection

• Logic is not immune to SER– All feedback nodes are susceptible

• Errors in compute elements may become SDC (silent data corruption) or at best system crash (equally undesirable)– Low end system may be OK with a

reboot; not acceptable for a server– SDC is the most vulnerable; won’t know

unless computation is repeated at another time

Page 15: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 15

Crosstalk (reliability?)Interconnect Geometry

0

200

400

600

800

1997 2001 2006 2009 2012

Pit

ch (n

m)

0.0

1.0

2.0

3.0

4.0

Asp

ect R

atio

Coupling vs. Substrate capacitance

0

2

4

6

8

1997 2001 2006 2009 2012

Cc/

Cs

Pitch

Aspect RatioFrequency DoublesFrequency Doublesin Two Yearsin Two Years

~30 GHz~30 GHz

Pentium III procPentium III procPentiumPentium®® ProPro

PentiumPentium ®® procproc486486

100100

1,0001,000

10,00010,000

100,000100,000

’00’00

FrequencyFrequency(MHz)(MHz)

386386286286

11

1010

’70’70 ’90’90

8086808680858085

808080808008800840044004

’80’80

0.10.1

30GHz30GHz14GHz14GHz

6.5GHz6.5GHz3 GHz3 GHz

’10’10

Pentium III procPentium III procPentiumPentium®® ProPro

PentiumPentium ®® procproc486486

100100

1,0001,000

10,00010,000

100,000100,000

’00’00

FrequencyFrequency(MHz)(MHz)

386386286286

11

1010

’70’70 ’90’90

8086808680858085

808080808008800840044004

’80’80

0.10.1

30GHz30GHz14GHz14GHz

6.5GHz6.5GHz3 GHz3 GHz

’10’10

~30 GHz~30 GHz

Frequency DoublesFrequency Doublesin Two Yearsin Two Years

Noise ∝ Cc dV/dt

Page 16: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 16

Crosstalk can also be a transient event

• Multiple coupling possible in meshes of wires

• Individual analysis may give a healthy report– But their combined

forces (e.g. buses) may create enough of a slowdown or significant glitch

– Not correlated signals may be very hard to analyze (timing windows, functional sensitizable)

• May appear as a random system failure– Data dependent and/or

machine state dependent

When will your neighbors all gang on you?

Page 17: Is CMOS more reliable with scaling? - Stanford Universitycrc.stanford.edu/BAST/slides/Mak_BAST03DSMreliability2.pdf · TM Mak CMOS reliability with scaling 1 ... 0.25m. TM Mak CMOS

TM Mak CMOS reliability with scaling 17

Summary• Scaling will bring less reliable electronics unless we

come up with new solutions– Scaling bring more leakage (Igate and Ioff) – Both manufacturing issues (Burn-in and lack thereof) and

long term quality/reliability issues (field infant mortality, hot electron degradation and wearout)

– Leakage problems further exacerbated with Vt mismatches– Scaling also increase logic susceptibility to Soft Error– Signal integrity may appear as a reliability issue

• Traditional fault tolerance solution don’t work – Excessive power consumption– 2X+ cost

• Fault tolerance largely an architecture solution; does it mean less testing with adoption of Fault tolerance?