daniel bittman peter alvaro darrell d. e. long ethan l. miller · 26/02/2019  · daniel bittman...

35
Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26

Upload: others

Post on 12-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping

Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller

1

FAST ‘192019-02-26

Page 2: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

CRSS Confidential

Byte-addressable Non-volatile Memory

2

BNVM is coming, and with it, new optimization targets

Page 3: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

CRSS Confidential

Byte-addressable Non-volatile Memory

3

0 1 0 1 0 0 0 1

0 1 1 0 0 1 0 1

It’s not just writes... ...it’s the bits flipped by those writes

Page 4: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

BNVM power usage

4

Page 5: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Can we take advantage of this?

5

Software vs. hardware?

Page 6: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Can we take advantage of this?

6

Software vs. hardware?

How hard is it to reason about bit flips?

Page 7: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Can we take advantage of this?

7

Software vs. hardware?

How hard is it to reason about bit flips?

How do we design data structures to reduce bit flips?

Page 8: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

8

Page 9: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Reducing Bit Flips in Software ●

●●

9

Page 10: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

XOR linked lists

10

value

prevnext

value

prevnext

value

prevnext

valuexptr

valuexptr

valuexptr

xptr = next ⊕ prev

Traditional doubly linked list

XOR linked list

Page 11: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Pointers!

11

Some actual pointersA = 0x000055b7bda8f260B = 0x000055b7bda8f6a0

A ⊕ B = 0x4C0 = 0b10011000000

Page 12: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Using XOR in hash tables

12

key

value

xnext

key

value

xnext

key

value

xnext

key

value

xnext

...

...

Page 13: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Using XOR in hash tables

13

key

value

xnext D

key

value

00000 0

key

value

xxxxx 1

Both indicate “entry is empty”

Page 14: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

From XOR linked lists to Red Black Trees

14

L RP

Standard 3-pointer red-black tree design

Page 15: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

From XOR linked lists to Red Black Trees

15

L RP LX RXLX = L ⊕ P

RX = R ⊕ P

Now 2-pointer, and XOR pointers

Page 16: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Evaluation ●

16

Page 17: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Experimental framework

17

, at the memory controller

Test different data structures, with different cache parameters,over a varying number of operations

Page 18: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

18

Bit flips: calling malloc()

Cross-over point between40 and 48 bytes

Page 19: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

19

Bit flips: XOR Linked Lists

XOR linked lists reducebit flips dramatically

better

Page 20: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

20

Bit flips: Hash table

better

Hash table already hadfew flips to save:

chains should be short

Page 21: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

21

Bit flips: Red-black Trees

better

Significant savings

Page 22: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

22

Bit flips: L2 Cache Behavior

better

L2 cache has ultimately little effect!

Page 23: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

23

Bit flips: L2 Cache Behavior

better

L2 cache has ultimately little effect!

...even when increasing in size

Page 24: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

24

Performance: RBT insert

better

Performance is notsignificantly affected!

a) Performance cost of XORsb) Performance benefit of smaller node size

Page 25: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

25

Performance: hash table

better

Almost no effect on performance

Page 26: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

CRSS Confidential

Conclusions

26

Savings are significant with little performance impact.

We can design around bit flips, and we should.

Bit flip/write inversion

Page 27: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

27

Data structures

Systemsimulation Caching

effects Pointerdistance

Power and wear estimates

Full systems

Real hardware

More datastructuresAdditional

techniques

Bitflips from algorithms

ABI modifications

Page 28: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Thank You! Questions?Daniel Bittman

[email protected] [email protected]@ucsc.edu

28

https://gitlab.soe.ucsc.edu/gitlab/crss/opensource-bitflipping-fast19

Page 29: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

29

Backup slides

Page 30: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

30

Bit flips: instrumentation

better

Page 31: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

31

Performance: RBT lookup

better

Page 32: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

32

Performance: XOR Linked List

Operation XOR Linked List Doubly Linked List

45 +/- 1 45 +/- 1

27 +/- 1 28 +/- 1

2.6 +/ 0.1 2.2 +/- 0.1

Page 33: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Stack frames

33

arg0pc

csr0sp

csr1

arg0pc

csr1sp

fn0 fn1

Page 34: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

Stack frames

34

arg0pc

csr0sp

csr1

arg0pcsp

fn0 fn1

csr1

“Wasted space”

Page 35: Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller · 26/02/2019  · Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller 1 FAST ‘19 2019-02-26. CRSS Confidential

35

Bit flips: stack frames