fine-grained power budgeting to improve write throughput of mlc pcm

25
International Symposium on Microarchitecture Fine-grained Power Budgeting to Improve Write Throughput of MLC PCM 1 Lei Jiang, 2 Youtao Zhang, 2 Bruce R. Childers and 1 Jun Yang 1 Electrical and Computer Engineering Department 2 Computer Science Department University of Pittsburgh, Pittsburgh

Upload: newton

Post on 22-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Fine-grained Power Budgeting to Improve Write Throughput of MLC PCM. 1 Lei Jiang , 2 Youtao Zhang, 2 Bruce R. Childers and 1 Jun Yang 1 Electrical and Computer Engineering Department 2 Computer Science Department University of Pittsburgh, Pittsburgh. Phase Change Memory (PCM). Intel - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

International Symposium on Microarchitecture

Fine-grained Power Budgeting to Improve Write Throughput

of MLC PCM

1Lei Jiang, 2Youtao Zhang, 2Bruce R. Childers and 1Jun Yang

1Electrical and Computer Engineering Department2Computer Science Department

University of Pittsburgh, Pittsburgh

Page 2: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Phase Change Memory (PCM)

2

DRAM PCM?

# of Cores (C#)↑

ARMCortexA15

4 cores

IntelXeon

8 cores

AMD Bulldozer16 cores

Working Set of Single Thread

(WSST)↑

MemCached

Memory Capacity ↑ = C# x WSSTlargesmall

Figures are from ARM, Intel, AMD, VoltDB, Memcached, MySQL and Samsung website

Page 3: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Volta

ge

Time

Multi-Level Cell and PCM write

3

Capacity ↑ Cost-per-bit ↓

Large Resistance Difference

01 11 1000

Vverify Vverify Vverify

Vreset

Vset,0Vset,1

Vset,2

curr

ent a

mpl

itude

time

Glass Transition Temperature (~300℃)

Melting Point (~600℃)

Higher than Vdd write voltage

Nondeterministic write

Page 4: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Volta

ge

Time

Multi-Level Cell and PCM write

4

Capacity ↑ Cost-per-bit ↓

Large Resistance Difference

01 11 1000

Vverify Vverify Vverify

Vreset

Vset,0Vset,1

Vset,2

curr

ent a

mpl

itude

time

Glass Transition Temperature (~300℃)

Melting Point (~600℃)

Higher than Vdd write voltage

Nondeterministic write

More write power and energy

Write is non-deterministic

Page 5: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

LCPLCPLCPLCPLCP LCPLCPLCP

On-chip MemoryController

Bridge Chip

IM

PCM DIMM and Chip Architecture

5

1 Bridge Chip[FANG_PACT2011]: handles non-deterministic write

Iteration Manager (IM): iterative programming algorithm

2 Local Charge Pump (LCP): boosts voltage and current for writes

Page 6: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Power Constraint and Solution for SLC

6

• DIMM level power constraint (DLPC) [HAY_MICRO’11]– One DIMM only supports 560 concurrent RESETs (power token)– ~one 512-bit (64B) write – poor write throughput

• SLC power management (SPM) [HAY_MICRO’11]– Approximately estimate # of written cells in cache by MC– Allocate power tokens based on estimated number– Reclaim after a fixed write latency– Can write ~ 8 64B lines (assuming 15% cell changing rate)

0.40.50.60.70.80.9

1Ideal SPM

Spe

edup Ideal SPM

~Full write throughput

Page 7: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

• Higher power demand, but DLPC does not increase– MLC has larger write power– MLC needs larger memory line size and LLC– More cell changes, lower write throughput

• Nondeterministic write on MLC– Reclaim power tokens after a fixed latency?

0.40.50.60.70.80.9

1Ideal SPM SPM on MLC (DIMM only)

Spe

edup

A Different Story on MLC

7

Worst case write latency must be used → Power tokens wasted

67%

SPM does NOT work on MLC!

Ideal SPM MLC

Page 8: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

• Total # of cells written per chip is limited too– Introduced by local charge pump (LCP) [LEE_JSSCC’09]– LCP power supply ability ∝ LCP area

In Addition: Chip Level Power Constraint

8

[CHOI_ISSCC’12]

15%-20% area overhead

Page 9: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

DIMM and Chip Power Constraints Example

9

Bank 0Bank 1

Chip 0 budget Chip 1 budget Chip 2 budget4 4 4

00 00 00 0000 00 00 00

00 00 00 0000 00 00 00

00 00 00 0000 00 00 00

DIMM120

11 11 11 11

DIMM8

Chip power constraint is violated!

Hot chip

WR-A (bank 0) 11 11 11 11 00 00 00 0000 00 00 0000 00 00 00WR-B (bank 1) 00 00 00 0000 0011 11

1 Write-A obeys both DIMM and chip power constraintsIt can go to bank 0.

2 Write-B violates chip power constraint. It has to be stopped.

Page 10: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

0.40.50.60.70.80.9

1

Ideal SPMDIMM only DIMM+chip

Spe

edup

Performance with Both Power Constraints

10

DIMM and chip power constraints hurt write throughput / performance a lot !

49%Ideal SPM DIMM Chip

Page 11: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Simple Solutions?

11

• Intra-line wear leveling [ZHOU_ISCA’09]– Periodically shift N bytes for one line

• Scheduling for power constraints– Reorder writes

………..

WR-AWR-BWR-CWR-D

………..

WR-AWR-BWR-CWR-D

………..

WR-AWR-CWR-BWR-D

………..

WR-AWR-BWR-CWR-D

Shift bytes

reordering B and C

Conflict

No Conflict

ConflictConflict

No Conflict

No Conflict

4x throughput

1.5x throughput

Page 12: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

But They do NOT Help

12

PWL intra-level wear leveling without overheadScheduling Scheduling writes under both power constraintsN x local Enlarging local charge pump

0.40.50.60.70.80.9

1Ideal DIMM only DIMM+chip PWL sche24 sche48

sche96 1.5xlocal 2xlocal

Spe

edup

--- No effect--- No effect

--- 1.5xlocal No effect

2 x local ≈ DIMM only case, but 100% overhead!

DIM

M+c

hip

PWL Scheduling1.5xLocal2xLocalD

IMM

onl

y

Page 13: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

13

How to tackle chip level power constraint?

Page 14: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Global Charge Pump

14

1 GCP balances power supply among chips

2 Power of GCP + LCPs ≤ DIMM level power constraint

3 Each sub-array is powered by either GCP or LCP, not both

IM

Bridge Chip

GCP

LCP LCP LCP LCP

DIMM

4 Long wire → large resistance on wire[OH_JSSC’06] → low efficiency

5 Tradeoff between power utilization and efficiency

Page 15: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Global Charge Pump

15

0.4

0.5

0.6

0.7

0.8

0.9

1

Ideal DIMM only DIMM+chipGCP-NE GCP-NE-0.7 GCP-NE-0.5

Spe

edup

GCP+50% eff. cancels the benefit of GCP!

GCP+100% eff. can relieve chip level P constraint!

Page 16: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Cell Mapping

16

64B line = 256 cells

7 6 5 01234Chip

Naïve Mapping (NE)31 …. 0255

01234567

Vertical Interleaving (VIM)

7 6 5 01234Chip 01234567

Chip# = Cell# mod 8

Page 17: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Can We Do Even Better?

17

Braided Interleaving (BIM)255

012345677 6 5 01234Chip

Chip# = (Cell# – Cell# / 16) mod 831 30 29 … 23 22 …. 16 15 14 … 8 7 6 5 4 3 2 1 0

0 72 14 36 5 1 03 25 47 61 03 25 47 6

Page 18: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Effectiveness of Cell Mapping

18

0.40.50.60.70.80.9

1

Ideal DIMM only DIMM+chipGCP-NE GCP-NE-0.7 GCP-NE-0.5GCP-VIM-0.7 GCP-VIM-0.5 GCP-BIM-0.7

Spe

edup

GCP + V/BIM + 70% eff. ≈ GCP + 100% eff. !

GCP + V/BIM + 50% eff. > GCP + 70% eff.

?

Page 19: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

19

Can we utilize DIMM level power budget much better?

Page 20: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Iteration Power Management

A: 50 cell changes B: 60 cell changespower latency

Reset 2 1Set 1 2

50Reset

6040Set

3626Set

2012Set

12 2

Set

ideally

SPMonMLC

Total : 80

5050

40 26 1250 50 50

60 36 20 12 260 60 6060 60

wait

Complete in 9 units of time

Complete in 16 units of time

Page 21: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

21

Iteration Power Management

5050

40 26 1225 20 13

60 36 20 12 230 18 1060 6

ProposedIPM

A: 50 cell changes B: 60 cell changespower latency

Reset 2 1Set 1 2Total : 80

Complete in 12 units of time

MultiRESET(MR)

Complete in 10 units of time

40 26 125050 25 20 13

60 36 20 12 230 18 1060 6

3030

3030

Page 22: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Experimental Methodology• In-order 8-core 4GHz CMP processor

– L1: private i-32KB/d-32KB– L2: private 2MB, 64B line– L3: DRAM off-chip, private 32MB, 256B line

• 4GB 2-bit MLC PCM main memory– One DIMM, single-rank, 8 banks– R/W queue 24 entries [HAY_MICRO’11]– Read first; schedule writes when NO read– Queue is full → write burst issuing all write until queue is empty– RESET: 500 cycles, 300μA, 480μW– SET: 1000 cycles, 150μA, 90μW – MLC non-deterministic write model [QURESHI_HPCA’10]

• Benchmarks– SPEC2006, BioBench, MiBench and STREAM

22

Page 23: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Effectiveness of IPM

23

ast_m

bwa_

mlbm

_mles

_mmcf_

mxa

l_m

mum_m

tig_m

qso_

mco

p_m

mix_1

mix_2

mix_3

gmea

n1

3

5GCP GCP+IPM GCP+IPM+MR Ideal

Nor

mal

ized

Writ

e Th

roug

hput x2.4

0.40.50.60.70.80.9

1

Ideal DIMM only DIMM+chip GCPGCP+IPM+MR

Spe

edup 76%

86%

Page 24: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

Conclusions• Increasing # of cores & Enlarging working set

– Large & scalable main memory: MLC PCM

• Two power restrictions on MLC PCM– Limited DIMM level power constraint– Small chip level power constraint

• Global charge pump– Overcome chip level power constraint

• Iteration power management– Better utilize DIMM level power budget

• Our techniques achieve– Write throughput ↑ by x2.4; Performance ↑ by 76%

24

Page 25: Fine-grained Power Budgeting to Improve Write Throughput  of MLC PCM

International Symposium on Microarchitecture

Fine-grained Power Budgeting to Improve Write Throughput

of MLC PCM

1Lei Jiang, 2Youtao Zhang, 2Bruce R. Childers and 1Jun Yang

1Electrical and Computer Engineering Department2Computer Science Department

University of Pittsburgh, Pittsburgh