nvwal: exploiting nvram in write-ahead logging · outline motivation •write amplification problem...
TRANSCRIPT
-
NVWAL: Exploiting NVRAM in Write-Ahead Logging
남범석
(Beomseok Nam)
UNIST (울산과기원)
1
-
Outline
Motivation• Write Amplification Problem in SQLite
NVWAL: Write-Ahead-Logging on NVRAM [ASPLOS’16]• Byte-granularity differential logging• User-level NVRAM management for WAL• Transaction-aware lazy synchronization
On-going Works:• Failure-atomic Slotted Paging for Persistent Memory
Conclusion
2
-
Motivation
-
SQLite
4
data
-
Write Amplification in SQLite
5
SQLite
File System
Slow storage
Write 4K Database Journal FileWrite 4K Database File
File IO +File System Journaling IO
Write 8 bytes
-
Rollback Journal
원본파일
page i page jpage i page j
Application
-
Rollback Journal
원본파일
page i page j
Application
page i page j
저널파일
page i page j
write (page i)write (page j)
-
Write-Ahead Logging in SQLite
8
SQLite
File System
Slow storage
Write 4K WAL File
File IO +File System Journaling IO
Write 8 bytes
-
Write-Ahead Logging
원본파일
page i page jpage i page j
WAL 파일Application
-
WAL 파일
Write-Ahead Logging
원본파일
page i page jpage i page j
write (page i)write (page j)
Application
-
Write-Ahead Logging in SQLite
11
SQLite
File System
Fast Storage
Write 4K WAL File
File IO +File System Journaling IO
Write 8 bytes
Emerging Non Volatile Memory
-
NVRAM
12
How?
Memory Storage
Byte-addressable
FastPersistence
NVRAM
CheapVolatile
-
NVWAL: Exploiting NVRAM in Write-Ahead Logging
-
NVWAL (NVRAM Write-Ahead Logging)
Byte-granularity Differential Logging
User-level Heap Management for WAL
Transaction-aware Lazy Synchronization
14
NVRAM Heap Manager
H/W OS DB
-
Differential Logging
15
-
^
Write-Ahead Logging
Volatile
Non-Volatile
DRAM Buffer
Flash memoryWAL File
Database File
16
Commit!
-
Write-Ahead Logging in NVRAM
DRAM Buffer
Flash memoryDatabase File
NVRAM Log
17
Commitv
v
v
-
Differential Logging
18
Wal frame
Wal frame header
Wal frame
Wal frame header
Wal frame
Wal frame header
Wal frame
Wal frame header
WAL Header
Differential Logging reduces
up to 84% of unnecessary I/O.
Commit!
Unused
Unused
DRAM Buffer
NVRAM Log
UnusedUnused
-
User-level Heap Management
19
-
NVRAM Log
Block Management by NVRAM Heap Manager
20
User level
Kernel level
Wal frame
Wal frame header
Wal frame
Wal frame header
Wal frame
Wal frame header
Wal frame
Wal frame header
Commit
DRAM Buffer
nv_malloc() overhead
nv_malloc() overhead
nv_malloc() overhead
nv_malloc() overhead
-
NVRAM User-level management
Kernel space
User Space NVRAM Heap
21
u : in-use
p : pending
f : free
Metadatablock
Metadata
f f f f f fDRAM buffer
WAL Header
Wal frame header
Wal frame(Page 3)
Wal frame(Page 7)
Wal frame header
Unused
next
Commit!
u
-
NVRAM User-level management
WAL Header
Wal frame header
Wal frame(Page 3)
Wal frame(Page 7)
Wal frame header
Unused
Kernel space
22
Unused
Commit!
u : in-use
p : pending
f : freeMetadata
u p f f f f f
next next
DRAM buffer
User Space NVRAM Heap
Metadatablock
b = nv_pre_malloc()is called !
-
NVRAM User-level management
WAL Header
Wal frame header
Wal frame
Wal frame
Wal frame header
Unused
Kernel space
23
Unused
Wal frame header
Wal frame
Wal frame
Wal frame header
Wal frame
Wal frame header
Unused
Commit!
u : in-use
p : pending
f : freeMetadata
u u f f f f f
next next
DRAM buffer
User Space NVRAM Heap
Metadatablock
nv_set_flag(b,in-use)is called !
-
Recovery in NVWAL (1) – free
24
Unused
WAL Header
Next frame
Wal frame header
Wal frame(Page 1)
Wal frame(Page 7)
Wal frame header
Unused
in-use free
next
-
Recovery in NVWAL (2) – pending
25
Unused
WAL Header
Next frame
Wal frame header
Wal frame(Page 1)
Wal frame(Page 7)
Wal frame header
Unused
in-use freepending
Recovery process reverts ‘pending’ to ‘free’next
-
Recovery in NVWAL (3) – pending
26
Unused
WAL Header
Wal frame header
Wal frame(Page 1)
Wal frame(Page 7)
Wal frame header
Unused
in-use free
Recovery process deletes pointers to pending blocks
pending
next
-
Recovery in NVWAL (4) in-use
27
Unused
WAL Header
Wal frame header
Wal frame(Page 1)
Wal frame(Page 7)
Wal frame header
Unused
in-use pendingin-use
Wal frame header
Wal frame(Page 3)
Normal WAL recoverynext
-
Transaction-aware Lazy Synchronization
28
-
Persistency Guarantee in Flash
29
W FW W F
W
F
write()
fsync()
Insert data
-
Eager Synchronization in NVRAM
30
C pbmm pb
m
mb
CLflush
pb
memcpy
memory barrier
cache line flush
persist barrier
CLflush mbmb
Enforce ordering
CLflush
mbmb
Enforce ordering
CLflush
mbmb
Enforce ordering
Insert data
-
Transaction-Aware Persistency Guarantee in NVRAM
31
C CLflush
pbmbmbmmCL
flushCL
flushpbmb mbmb mb
m
mb
CLflush
pb
memcpy
memory barrier
cache line flush
persist barrier
Logging phases Commit phases
Insert data
-
Asynchronous Commit in NVRAM
32
Insert data
mmCL
flushCL
flushpbmbmb C
CLflush
pbmbmbChksum
m
mb
CLflush
memcpy
memory barrier
cache line flush
pb persist barrier
Chksum
checksum
-
Evaluation
Implement NVWAL in SQLite 3.7.11 • Used in Android 4.4
Tuna • NVRAM emulation board with ARM Cortex-A9• DDR3-SDRAM DRAM • DDR3-SDRAM NVRAM (Xilinx Zynq SOC)
Nexus5• 2.26 GHz Snapdragon 800 processor• DDR memory
• we assume that specific address range of DRAM is NVRAM• NVRAM latency is emulated by nop operations.
Performance Analysis Tools• Mobibench
33
-
Overhead of Ordering Constraints
Lazy synchronization eliminates up to 23% of persistency overhead.
34
Memory fenceCache line flush
Cache line flush + mfencememcpy
-
Differential Logging and I/O
Differential logging eliminates up to 84% of the unnecessary I/O.
35
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 4 8 16 32
Nor
mal
ized I
/O
Number of Insertion Per Transaction
Effect of Differential Logging
Insert (Diff)
Up to 84% of I/O reduced
-
Transaction Throughput and NVRAM Latency
User-Level Heap improves 6% of performance.
Differential logging yields up to 28% higher throughput.
Combining all, we can get up to 37% higher performance.
36
28% improvement37% improvement6% improvement
insensitive
-
Transaction Throughput of NVWAL on Nexus5
Combining all of optimization performs at least 10 times faster when NVRAM latency is smaller than 3us.
With our optimization, NVWAL shows performance similar to that of WAL on flash memory when the write latency is set to 230 usec.
37
10x faster
-
Conclusion
Strict ordering of memory writes causes unnecessary overhead. • Transaction-aware lazy synchronization
Leveraging byte-addressability of NVRAM• Byte-granularity differential logging
• User-level NVRAM heap manager
Via the optimizations, we make application performance insensitive to the NVRAM write latency. • 400 nsec → 1900 nsec NVRAM latency results in
only 4% performance degradation
38
-
39
Thank You
http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=WO3CFisM6ExYuM&tbnid=Zl4dSIX_fZWCfM:&ved=0CAUQjRw&url=http://www.teamjunell.com/&ei=RWLBUeK6NYHxkAXMnYG4DA&bvm=bv.47883778,d.dGI&psig=AFQjCNEOOwtcDe_3FNNW3eETT2Ky6j_SVw&ust=1371714348613795http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=WO3CFisM6ExYuM&tbnid=Zl4dSIX_fZWCfM:&ved=0CAUQjRw&url=http://www.teamjunell.com/&ei=RWLBUeK6NYHxkAXMnYG4DA&bvm=bv.47883778,d.dGI&psig=AFQjCNEOOwtcDe_3FNNW3eETT2Ky6j_SVw&ust=1371714348613795