virtualized and flexible ecc for main memory...0 4 ea: 0x0540 0 5 wr: 0x0540 ecc address translation...

26
Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin 1 ASPLOS 2010

Upload: others

Post on 20-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Virtualized and Flexible ECC for Main Memory

Doe Hyun Yoon and Mattan Erez

Dept. Electrical and Computer Engineering

The University of Texas at Austin

1ASPLOS 2010

Page 2: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Memory Error Protection

• Applying ECC uniformly – ECC DIMMs– Simple and transparent to programmers

• Error protection level– Fixed, design-time decision

• Chipkill-correct used in high-end servers– Constrain memory module design space

• Allow only x4 DRAMs• Lower energy efficiency than x8 DRAMs

• Virtualized ECC – objectives– To provide flexible memory error protection– To relax design constraints of chipkill

2

Page 3: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Virtualized ECC

• Two-tiered error protection

• Tier-1 Error Code (T1EC)

– Simple error code for detection or light-weight correction

• Tier-2 Error Code (T2EC)

– Strong error correcting code

• Store T2EC within the memory namespace itself

– OS manages T2EC

• Flexible memory error protection

– Different T2EC for different data pages

– Stronger protection for more important data

3

Page 4: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Virtualized ECC – Example

4

Physical Memory

Data T1EC

Virtual Address space

Low

High

Virtual Page to Physical Frame mapping

Physical Frame to ECC Page mapping

T2EC for Chipkill

T2EC for DoubleChipkill

Page frame – i

Page frame – j

Page frame – k

ECC page – j

ECC page – k

Virtual page – i

Virtual page – j

Virtual page – k

Error Protection

Level

Page 5: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

VIRTUALIZED ECC

5

Page 6: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Observations on Memory Errors

• Per-system error rate is still low– Most of time, we try to detect errors

finding no error

• To detect errors is a common case operation– Need a low latency, low complexity

error detection mechanism T1EC

• To correct errors is an uncommon case operation– Correction can be complex, take a long time– But, still need to manage

error correction info somewhere Virtualized T2EC

6

Page 7: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Uniform ECC

7

Physical Memory

Data ECC

VPN

Virtual Memory

VA offset

PFN offsetPA

Page Frame

PA

Page 8: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Virtualized ECC

8

Physical Memory

Data T1EC

VPNVA offset

PFN offsetPA

Scale according to T2EC size

offsetECC Address

OS managesPFN to EPNtranslation

ECC page number

T2EC

ECC Page

PA

EA

Virtual Memory

Page Frame

Page 9: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Data T1EC Data T1EC

LLC

DRAM Rank 0 Rank 1

ECC Address Translation Unit

T2EC for Rank 1 data

T2EC for Rank 0 data

0000

0080

0100

0180

0200

0280

0300

0380

0400

0480

0500

0580

0040

00c0

0140

01c0

0240

02c0

0340

03c0

0440

04c0

0540

05c0

PA: 0x02003

Wr: 0x02002

B0

B0

Rd: 0x00c01

A

A

B1

B2

B3

1 2 3

1 2 3

EA: 0x054040

0

Wr: 0x05405

Virtualized ECC operationRead: fetch data and T1ECDon’t need T2EC in most casesWrite: update data, T1EC, and T2ECECC Address Translation Unit: fast PA to EA translationT2ECs of consecutive data lines map to a T2EC lineT2EC lines can be partially validUpdate only valid T2EC to DRAM

Page 10: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Penalty with V-ECC

• Increased data miss rate

– T2EC lines in LLC reduce effective LLC size

• Increased traffic due to T2EC write-back

– One-way write-back traffic

• Not in a critical-path

10

Page 11: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

CHIPKILL-CORRECT

11

Page 12: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Chipkill-correct

• Single Device-error CorrectDouble Device-error Detect

– Can tolerate a DRAM failure

– Can detect a second DRAM failure

• Chipkill requires x4 DRAMs

• x8 chipkill is impractical

– But, x8 DRAM is more energy efficient

12

Page 13: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Baseline x4 Chipkill• Two x4 ECC DIMMs

– 128bit data + 16bit ECC (redundancy overhead: 12.5%)– 4 check symbol error code using 4-bit symbol

• Access granularity – 64B in DDR2 (min. burst 4 x 128 bit)– 128B in DDR3 (min. burst 8 x 128 bit)

13

x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4

x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4

144-bit wide data bus

Page 14: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

x8 Chipkill• x8 chipkill with the same access granularity

– 152-bit wide data path

• 128-bit data + 24-bit ECC

• Redundancy overhead: 18.75%

– Need a custom-designed DIMM

• Increase the system cost a lot

14

152-bit wide data bus

x8 x8 x8 x8 x8 x8 x8 x8 x8

x8 x8 x8 x8 x8 x8 x8 x8 x8

x8

Page 15: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

x8 Chipkill /w Standard DIMMs

• Increase access granularity– 128B in DDR2 (min. burst 4 x 256 bit)– 256B in DDR3 (min. burst 8 x 256 bit)

15

x8 x8 x8 x8 x8 x8 x8 x8 x8

x8 x8 x8 x8 x8 x8 x8 x8

x8 x8 x8 x8 x8 x8 x8 x8 x8

x8 x8 x8 x8 x8 x8 x8 x8 x8

280-bit wide data bus

Page 16: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

V-ECC for Chipkill

• Use 3 check symbol error codes– Single Symbol-error Correct and

Double Symbol-error Detect

• T1EC– 2 check symbols

– Detect up to 2 symbol error

• T2EC– 3rd check symbol

– Combined T1EC/T2EC provides Chipkill16

Page 17: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

V-ECC: ECC x4 configuration

• Use 8-bit symbol error code

– 2 bursts out of a x4 DRAM form an 8bit-symbol

• Modern DRAMs have minimum burst of 4 or 8

• 1 x4 ECC DIMM + 1 x4 Non-ECC DIMM

• Each DRAM access in DDR2 (burst 4)

– 64B data, 4B T1EC

– 2B T2EC is virtualized within memory namespace

• 32 T2ECs per 64B cache line

17

Virtualized within memory

x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4

x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4

136-bit wide data bus

Data

Data

T1EC

T2EC

Page 18: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

144-bit wide data bus

x8 x8 x8 x8 x8 x8 x8 x8 x8

x8 x8 x8 x8 x8 x8 x8 x8 x8

V-ECC: ECC x8 configuration

• Use 8-bit symbol error code

• 2 x8 ECC DIMMs

• Each DRAM access in DDR2 (burst 4)

– 64B data, 8B T1EC

– 4B T2EC is virtualized

• 16 T2ECs per 64B cache line

18

Data

Data

T1EC

T1EC

T2EC

Virtualized within memory

Page 19: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Flexible Error Protection

• Single HW with V-ECC can provide– Chipkill-detect, Chipkill-correct, and

Double chipkill-correct

– Use different T2EC for different pages

• Reliability – Performance tradeoff

• Maximize performance/power efficiency with Chipkill-Detect

• Stronger protection at the cost of additional T2EC access

19

Chipkill-Detect

Chipkill-Correct

Double Chipkill-Correct

ECC x4 0B 2B 4B

ECC x8 0B 4B 8B

Page 20: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

EVALUATION

20

Page 21: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Simulator/Workload

• GEMS + DRAMsim– An out-of-order SPARC V9 core– Exclusive two-level cache hierarchy– DDR2 800MHz – 12.8GB/s (128-bit wide data path)

• 1 channel 4 ranks

• Power model– WATTCH for processor power – scaled to 45nm– CACTI for cache power – cacti 45nm– Micron model for DRAM power – commodity DRAMs

• Workloads– 12 data intensive applications

from SPEC CPU 2006 and PARSEC– Microbenchmarks: STREAM and GUPS

21

Page 22: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

0.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg

SPEC 2006 PARSEC

Baseline x4 ECC x4 ECC x8

0.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

ST

RE

AM

GU

PS

Normalized Execution Time

• Less than 1% penalty on average

• Performance penalty

– Spatial locality

– Write-back traffic

0.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg

SPEC 2006 PARSEC

Baseline x4 ECC x4 ECC x8

0.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

ST

RE

AM

GU

PS

0.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg

SPEC 2006 PARSEC

Baseline x4 ECC x4 ECC x8

0.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

ST

RE

AM

GU

PS

Page 23: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg

SPEC 2006 PARSEC

Baseline x4 ECC x4 ECC x8

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

ST

RE

AM

GU

PS

System Energy Efficiency

• Energy Delay Product (EDP) gain

– ECC x4: 1.1% on average

– ECC x8: 12.0% on average

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg

SPEC 2006 PARSEC

Baseline x4 ECC x4 ECC x8

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

ST

RE

AM

GU

PS

1.23

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

ST

RE

AM

GU

PS

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg

SPEC 2006 PARSEC

Baseline x4 ECC x4 ECC x8

20%17%

10%12%

Page 24: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

0.96

1.00

1.04

1.08

1.12

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Normalized Execution Time

Flexible Error Protection

0.96

1.00

1.04

1.08

1.12

1234567

0.60

0.70

0.80

0.90

1.00

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

bzip2 hmmer mcf libq omnet milc lbm sphinx3 canneal dedup fluid freq avg

SPEC 2006 PARSEC

Normalized EDP

0.60

0.70

0.80

0.90

1.00

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

Chip

kill

Dete

ct

Chip

kill

Corr

ect

2 C

hip

kill

Corr

ect

STREAM GUPS

Chipkill-Detect

Chipkill-Correct

Double Chipkill-Correct

Page 25: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Conclusion

• Virtualized ECC– Two-tiered error protection, virtualized T2EC

• Improved system energy efficiency with chipkill– Reduce DRAM power consumption by 27%– Improve system EDP by 12%

• Performance penalty – 1% on average

• Error protection even for Non-ECC DIMMs– Can be used for GPU memory error protection

• Flexibility in error protection– Adaptive error protection level by user/system demand– Cost of error protection is proportional to protection level

25

Page 26: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC

Virtualized and Flexible ECC for Main Memory

Doe Hyun Yoon and Mattan Erez

Dept. Electrical and Computer Engineering

The University of Texas at Austin

26