some key-value stores using log-structure zhichao liang [email protected] leveldb riak
TRANSCRIPT
![Page 2: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/2.jpg)
Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion
![Page 3: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/3.jpg)
Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion
![Page 4: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/4.jpg)
Log Structure• A log-structured file system is a file system design first
proposed in 1988 by John K. Ousterhout and Fred Douglis.• Design for high write throughput, all updates to data and
metadata are written sequentially to a continuous stream, called a log.
• Conventional file systems tend to lay out files with great care for
spatial locality and make in-place changes to their data structures.
![Page 5: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/5.jpg)
Log Structure for SSD• Random write degrades the system performance and shrinks
the lifetime of ssd.• Log structure is ssd-friendly natively!
Magnetic Disk SSD
freefreefreefree
freefree
freefreefreefree
freefree
data 1new data 1data 2data 3data 4
new data 3
blockblock
data 3data 2data 1 RAM
free
freefree
data 2
erasederasederased
new data 1data 2data 3 data 3
![Page 6: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/6.jpg)
Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion
![Page 7: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/7.jpg)
Riak ?• Riak is an open source, highly scalable, fault-tolerant
distributed database. • Supported core features:
- operate in highly distributed environments- no single point of failure- highly fault-tolerant- scales simply and intelligently- highly data available- low cost of operations
![Page 8: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/8.jpg)
Bitcask• A Bitcask instance is a directory, and only one
operating system process will open that Bitcask for writing at a given time.
• The active file is only written by appending, which means that sequential writes do not require disk seeking.
![Page 9: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/9.jpg)
Hash Index: keydir• A keydir is simply a hash table that maps every key in
a Bitcask to a fixed-size structure giving the file, offset and size of the most recently written entry for that key .
![Page 10: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/10.jpg)
Merge• The merge process iterates over all non-active file
and produces as output a set of data files containing only the “live” or latest versions of each present key.
![Page 11: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/11.jpg)
Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion
![Page 12: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/12.jpg)
RethinkDB ?• RethinkDB is a persistent, industrial-strength key-value store
with full support for the Memcached protocol.• Powerful technology:
- Linear scaling across cores- Fine-grained durability control- Instantaneous recovery on power failure
• Supported core features:- Atomic increment/decrement- Values up to 10MB in size- Multi-GET support- Up to one million transactions per second on commodity hardware
![Page 13: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/13.jpg)
Installation & usage• RethinkDB works on modern 64-bit distributions of
Linux.
• Running the rethinkdb server:
Ubuntu 10.04.1 x86_64 Ubuntu 10.10 x86_64Red Hat Enterprise Linux 5 x86_64 CentOS 5 x86_64SUSE Linux 10
Default installation path: /usr/bin/rethinkdb-1.0./rethinkdb-1.0 -f /u01/rethinkdb_data./rethinkdb-1.0 -f /u01/rethinkdb_data -c 4 -p 11500./rethinkdb-1.0 -f /u01/rethinkdb_data
-f /u03/rethinkdb_data -c 4 -p 11500
![Page 14: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/14.jpg)
The methodology• Firstly, lack of mechanical parts makes random reads
on SSD are significantly efficient!• Secondly, random writes trigger more erases, making
these operations expensive, and decreasing the drive lifetime!
• RethinkDB takes an append-only approach to storing data, pioneered by log-structured file system!
What are the consequences of appen-
only ?
![Page 15: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/15.jpg)
Append-only consequences
Data Consistency
Hot Backups
Instantaneous Recovery
Easy Replication
Lock-Free Concurrency
Live Schema Changes
Database Snapshots
2) large amount of data that quickly becomes obsolete in an environment with a heavy insert or update workload
1) eliminating data locality requires a larger number of disk access
![Page 16: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/16.jpg)
Append-only B-tree
Page 1 15 Page 2 95 Page 3 1915
Data File … …5 9 1915
Page 1 15
Page 2 95 Page 3 1915
15
Page 3 1915
Page 3 1915
Page 1 15
Page 1 15
![Page 17: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/17.jpg)
Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion
![Page 18: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/18.jpg)
LevelDB ?• LevelDB is a fast key-value storage library written at
Google that provides an ordered mapping from string keys to string values.
• Supported core features:- Data is stored sorted by key- Multiple changes can be made in one atomic batch- Users can create a transient snapshot to get a consistent view of data- Data is automatically compressed using the Snappy compression library
![Page 19: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/19.jpg)
Installation & usage• LevelDB works with snappy, which is a compression /decompression library.
• It is a library, no database server!svn checkout http://leveldb.googlecode.com/svn/trunk/leveldb-read-onlycd leveldb-read-onlymake && cp libleveldb.a /usr/local/lib &&cp -r include/leveldb /usr/local/include
download snappy from http://code.google.com/p/snappy/ cd snappy-1.0.4./configure && make && make install
libleveldb.a
![Page 20: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/20.jpg)
Log-structure merge tree• LevelDB
![Page 21: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/21.jpg)
Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion
![Page 22: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e795503460f94b78ea7/html5/thumbnails/22.jpg)
Conclusion• Log-structure