recent challenges. 2 soft errors scaling: seu (single-event upset): −ionizing radiation corrupts...

28
Recent Challenges

Upload: dario-virtue

Post on 14-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

Recent Challenges

Page 2: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

2

Soft Errors

• Scaling: SEU (Single-event upset):

− Ionizing radiation corrupts data stored Cause:

− Radioactive impurities in device packages− Recently: cosmic radiation

Scaling worsens SEU:1. Voltage scaling + reduced node capacitances − lower the charge threshold necessary to corrupt the

data2. Greater level of integration− increases the likelihood that soft errors will affect the

device

Page 3: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

3

SEU

• Sources: Configuration memory Flip-flops Memory blocks Combinational circuits (transient error permanent)

Page 4: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

Combinational circuits (transient error permanent)

4

Page 5: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

5

SEU in Configuration Memory

• SEU in cinfiguration bits (SRAM-based): In Virtex FPGAs, ~ 91% of sensitive bits to soft errors

are configuration bits − flash- or antifuse-based do not suffer

Any change to the configuration memory may alter the functionality

Persist until FPGA is reprogrammed

Page 6: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

6

SEU Mitigation Techniques

• Mitigation techniques:1. Circuit and technology-level:

− Addition of metal capacitors to nodes in the memory increases the amount of charge necessary to cause SEU

2. System-level:− Ensures that the system can detect and recover.

− Regularly verify their configuration memory by comparing the current values with the desired configuration state using cyclic redundancy checks (Altera Stratix III)

3. User-level:a) TMR (triple modular redundancy):

− Replicating a design three times and voting among outputs

− Reduce the sensitivity to soft errors in the design by careful selection of the resources used

Page 7: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

7

Circuit Level

• [Ebrahimi]: Reduce # SRAM cells in a switch box (6 5)( 6 4)

0 1 2 3

0

1

2

3

Page 8: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

8

Circuit Level

• [Ebrahimi]: Reduce # SRAM cells in a switch box (6 5)( 6 4)

W

S

N

a b

c d

e

f

w

x

y

z

0

0

0

E

0

Page 9: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

9

User Design Level• Care bits [Golshan07] :

Only a subset of configuration bits affect the design due to SEU.

• Resource A is used for net A A-B SRAM is not a care bit if B is not used by other nets. A-C SRAM bit is a care bit (change to ‘1’ hurts net A). A-D SRAM bit is not a care bit (w.r.t. net A) if D not used.

Page 10: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

10

User Design Level

• Soft Error Routing Problem [Golshan07]: Given a routing graph and a set of multi-terminal nets,

route each net with the least care-cost, where care-cost is the number of routing care bits.

• Experiments: 14% reduction in the number of care bits

− ~80% of soft errors in the FPGA: configuration memory [Kuon07]

Page 11: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

Recent Challenges

Process Variation

Page 12: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

12

Process Variation Sources

Leff2.3

2.2

2.1

1.9

1.8

50

100

0

2040

60

x 10-7

Wafer XWafer Y

2.0

[IBM, Intel and TSMC]

Page 13: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

13

Variation Variations• Variation of variation over years

• Variation from mean value

− Gate oxides are so thin that a change of one atom can cause a 25 percent difference in substrate current.

− EE Times (04/11/2006)

ILD: inter-layer dielectric

Page 14: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

14

Statistical Description

The combined set of underlying deterministic and random contributions are lumped into a combined “random” statistical description.

For devices on one wafer, the distribution (mean and variance) for L can be different from devices within a single die.

Page 15: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

15

Inter-die vs. Intra-die Variations

• Figures are courtesy of IBM, Intel and TSMC

Intra-die spatial Correlation

Inter-die global Correlation

Leff

Page 16: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

16

Impact of Variation• Importance of variation:

Timing violations− Yield loss

Page 17: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

17

Impact of Variation

• Process variations can cause up to 2000% variation in leakage current and 30% variation in frequency in 180nm CMOS

− Borkar, S., Karnik, T., Narenda, S., Tschanz, J., Keshavarzi, A., De, V. Parameter Variations and Impact on Circuits and Microarchitecture. In Proc. of DAC (2003), 338-342.

Page 18: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

18

Impact of Variation

Die-to-die frequency variation

Page 19: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

19

Variation in FPGA

• Binning: Historically: most of variation between dies

− FPGA manufacturers test the speed of each FPGA after manufacturing and binning each device according to its speed.

− Higher speeds: more expensive− Unacceptable leakage power: discard the device

More recently: significant within die variation− Cannot be leveraged in the same manner− Operating speeds must be reduced to maintain

functionality− 90nm: speed reduction of 5.7% − 22nm: speed reduction of 22.4%

Page 20: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

20

Solutions

• Architectural solution:1. Select the logic block architecture parameters to minimize

this variation− LUT size is particularly important [Wong05]

− LUT size = 4 : highest leakage yield− LUT size = 7 : highest timing yield− LUT size = 5 : maximum combined leakage and timing yield.

2. Adaptively compensate for any variation through body-biasing [Nabaa06]:− Slow blocks: set to a body bias decrease Vt increase

block’s speed− Fast blocks: increase threshold voltage reduce leakage

power Experiments:

− Area penalty: 1%–2% − Delay variability reduction: 30% − Leakage variability reduction: 78%

Page 21: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

21

Solutions

• CAD-Level:

1. Statistical static timing analysis (SSTA) in FPGA CAD tools − Improve delays by avoiding the margins that are

necessary for traditional STA

2. Testing multiple logically equivalent configurations of the FPGA to find one that is functional at the desired speed [Sedcole07]

3. Generating critical paths that will be more robust in the face of variation [Matsumoto07]

Page 22: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

22

Inter-die vs. Intra-die Variations

P0 = nominal design value

ΔPintradie = intra-die variation (within a given chip)

Δ Pinterdie = Inter-die variation (from one chip to another)

Δ Pe = remaining “random” or unexplained variation

P: a structural or electrical parameter e.g.− W,− tox,− Vth,

− channel mobility,− coupling capacitances,− line resistances.

Page 23: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

23

Corner Analysis

• PRCA (Process Corner Analysis): Takes

1. nominal values of process parameters2. and a delta for each parameter by which it varies.

Finds− performance as max and min values.

• Pros: Simple

• Cons: conservative inaccurate

Page 24: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

24

Corner Analysis

• PRCA shortcoming: Process corners are believed to coincide with

performance corners.− Fact: best-case corner may not depend on Pmin or Pmax

for a particular interconnect parameter but on a value within that range.

H

W

T

Hmax

Wmax Tmax

Hmin

WminTmin

M3

M2

H

TWCg

Cg

M1

Page 25: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

25

SSTA

Page 26: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

26

Solutions

• CAD-Level:

2. Testing multiple logically equivalent configurations of the FPGA to find one that is functional at the desired speed [Sedcole07]

Page 27: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

27

References

• [Kuon07] Kuon, Tessier, “FPGA Architecture: Survey and Challenges,” Foundations and Trends in Electronic Design Automation, Vol. 2, No. 2 (2007) 135–253.

• [Lin07] Yan Lin and Lei He, Device and Architecture Concurrent Optimization for FPGA Transient Soft Error Rate, ICCAD 2007

• [Golshan07] S. Golshan and E. Bozorgzadeh, “Single-event-upset (SEU) awareness in FPGA routing,” in DAC ’07:

• [Xilinx] www.xilinx.com• [Altera] www.altera.com• [Wong05] H.-Y.Wong, L. Cheng, Y. Lin, and L. He, “FPGA

device and architecture evaluation considering process variations,” in ICCAD, 2005.

• [Nabaa06] G. Nabaa, N. Azizi, and F. N. Najm, “An adaptive FPGA architecture with process variation compensation and reduced leakage,” DAC, 2006.

Page 28: Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device

28

References

• [Sedcole07] P. Sedcole and P. Y. K. Cheung, “Parametric yield in FPGAs due to within-die delay variations: A quantitative analysis,” in FPGA, 2007.