to lock, swap or elide: on the interplay of hardware transactional memory and lock-free indexing...
TRANSCRIPT
![Page 1: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/1.jpg)
To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing
Justin LevandoskiMicrosoft Research Redmond
Ryan StutsmanMicrosoft Research Redmond
Darko MakreshanskiDepartment of Computer Science
ETH Zurich
![Page 2: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/2.jpg)
2D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Motivation Hardware Transactional Memory
◦ Proposed as hardware support for lock-free data-structures [1]
◦ Introduced in Intel Haswell (2013)
Existing Lock-free data-structures◦ Relying on CPU atomic primitives (CAS, FAI)
◦ Notoriously difficult to get right
[1] Transactional Memory: Architectural Support for Lock-Free Data Structures, M. Herlihy, J. E. B. Moss, ISCA ‘93
![Page 3: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/3.jpg)
3D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Lock-free Programming Hardware Transactional Memory
![Page 4: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/4.jpg)
4D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Overview
Q1: Does HTM obviate the need for crafty lock-free designs?◦ A1: No. Technical limitations prohibit use of HTM as a general purpose solution.
Q2: What if all technical limitations are overcome?◦ A2: No. There are still important fundamental differences.
Q3: Can lock-free data-structures benefit from HTM?◦ A3: Yes. Using HTM for MW-CAS can simplify lock-free designs
![Page 5: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/5.jpg)
5D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Hardware Transactional Memory
If (BeginTransaction()) Then < Critical Section > CommitTransaction()Else < Abort Fallback Codepath >EndIf
Programming Model:
Sequence of instructions with ACI(D) properties
AcquireElidedLock() < Critical Section >ReleaseElidedLock()
Lock Elision:
Transaction buffers stored in core-local (L1) cache
Conflict-detection and ensuring atomicity piggyback on cache-coherence protocol
![Page 6: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/6.jpg)
D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING 6
Address
Mapping Table
Page B Page DPage C
Logical pointerPhysical pointer
Page A
A
B
C
D
Bw-Tree1 (A Lock-free B-Tree)
[1] The Bw-Tree: A B-tree for New Hardware. Levandoski, Lomet, Sengupta. ICDE ‘13
![Page 7: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/7.jpg)
D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING 7
Bw-Tree1 (Lock-free Updates)
Address
Mapping Table
P
Page P
Δ: Insert record 50
Δ: Delete record 48
Δ: Update record 35 Δ: Insert Record 60
Consolidated Page P
[1] The Bw-Tree: A B-tree for New Hardware. Levandoski, Lomet, Sengupta. ICDE ‘13
![Page 8: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/8.jpg)
8D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Overview
Q1: Does HTM obviate the need for crafty lock-free designs?
Q2: What if all technical limitations are overcome?
Q3: Can lock-free data-structures benefit from HTM?
![Page 9: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/9.jpg)
9D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
HTM Parallelized B-Tree Wrap individual tree operations in a transaction
◦ Effortless parallelization of existing single-threaded implementations
State-of-the-art in using HTM for database indexing [1,2]
Using the Google B-Tree implementation [3] ◦ In-memory single-threaded B-Tree
Q1: Does HTM obviate the need for crafty lock-free designs?
[3] https://code.google.com/p/cpp-btree/
[2] Improving In-Memory Database Index Performance with Intel®Transactional Synchronization ExtensionsKarnagel et al. HPCA 2014
[1] Exploiting Hardware Transactional Memory in Main-Memory Databases. V. Leis, A. Kemper, T. Neumann. ICDE 2014
![Page 10: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/10.jpg)
10D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
HTM Parallelized B-Tree Works well for simple use-cases
◦ Small key and payload sizes
8B Keys, 8B Payloads
4M Key-Payload pairs
Random read-only workload
Q1: Does HTM obviate the need for crafty lock-free designs?
![Page 11: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/11.jpg)
11D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
HTM Parallelized B-Tree Transaction size limited by cache size. (32KB L1 cache, 8-way associativity)
Q1: Does HTM obviate the need for crafty lock-free designs?
Sensitive to payload size
Sensitive to tree size
Hyper-threading
Even more sensitive to key size
![Page 12: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/12.jpg)
12D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Overview
Q1: Does HTM obviate the need for crafty lock-free designs?
Q2: What if all technical limitations are overcome?
Q3: Can lock-free data-structures benefit from HTM?
![Page 13: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/13.jpg)
13D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Lock-free vs HTM Lock-free Bw-Tree and HTM both offer optimistic concurrency control
HTM-parallelized data-structures can also provide lock-freedom
Can HTM be seen as a hardware-accelerated version of lock-free algorithms?
Fundamental difference:◦ Lock-free (Bw-Tree) -> copy-on-write (MVCC-like)◦ Transactional memory -> atomic update in-place (2PL-like)
Different behavior under read-write contention
Q2: What if all technical limitations are overcome?
![Page 14: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/14.jpg)
14D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Read-write Contention Experimental Setup
◦ 4 read-only point lookup threads ◦ 0-4 write-only point update threads◦ Zipfian skew (s = 2) ◦ Workload A
◦ Fixed-length 8-byte keys & payload◦ Workload B
◦ Variable length (30-70 byte keys)◦ 256-byte payloads
Q2: What if all technical limitations are overcome?
Workload A Workload B
![Page 15: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/15.jpg)
15D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Overview
Q1: Does HTM obviate the need for crafty lock-free designs?
Q2: What if all technical limitations are overcome?
Q3: Can lock-free data-structures benefit from HTM?
![Page 16: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/16.jpg)
16D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
HTM-enabled Lock-free B-Tree Bw-Tree Problem: Code complexity
◦ Structure modification operations (SMOs) such as page split, merge require multi-word CAS◦ Bw-Tree separates SMOs into multiple sub-operations
Reasoning about all possible race-conditions is hard
Use HTM as hardware support for multi-word compare-and-swap◦ SMOs can be installed in a single operation
Small transaction footprint -> avoid capacity problems
Q3: Can lock-free data-structures benefit from HTM?
![Page 17: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/17.jpg)
17D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Conclusion
Does HTM obviate the need for crafty lock-free designs?◦ No. Technical limitations prohibit use of HTM as a general purpose solution.
What if all technical limitations are overcome?◦ No. There are still important fundamental differences.
Can lock-free data-structures benefit from HTM?◦ Yes. Using HTM for MW-CAS can simplify lock-free designs
![Page 18: To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman](https://reader035.vdocuments.mx/reader035/viewer/2022070412/5697bf8e1a28abf838c8cdcb/html5/thumbnails/18.jpg)
18D. Makreshanski, J. Levandoski, R. Stutsman ON THE INTERPLAY BETWEEN HARDWARE TRANSACTIONAL MEMORY AND LOCK-FREE INDEXING
Conclusion