betrfs: a right-optimized write-optimized file system · point queries are asymptotically as fast...
TRANSCRIPT
![Page 1: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/1.jpg)
BetrFS: A Right-Optimized Write-Optimized File System
Amogh Akshintala, Michael Bender, Kanchan Chandnani, Pooja Deo, Martin Farach-Colton, William Jannen, Rob Johnson, Zardosht Kasheff, Bradley C. Kuszmaul, Prashant Pandey, Donald E. Porter, Leif Walsh, Jun Yuan, Yang
Zhan
Facebook, Farmingdale College, MIT & Oracle, Rutgers, Stony Brook, Two Sigma, UNC, Williams College
![Page 2: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/2.jpg)
• Sequential reads• Sequential writes• Random writes• File/directory renames• File deletes• Recursive scans• Metadata updates
General-purpose file-systems strive to perform well on a wide variety of applications
![Page 3: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/3.jpg)
• Sequential reads• Sequential writes• Random writes• File/directory renames• File deletes• Recursive scans• Metadata updates
Achieving good performance on all these operations is a long-standing challenge
Example: ext4
![Page 4: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/4.jpg)
• Sequential reads• Sequential writes• Random writes• File/directory renames• File deletes• Recursive scans• Metadata updates
Achieving good performance on all these operations is a long-standing challenge
Logging updates is fast, but logged data can have little
locality
Example: log-based file systems
![Page 5: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/5.jpg)
Some operations seem to require a trade-off
sequential reads random writes
directory scans renames
update-in-place log structured
inodes full-path indexing
![Page 6: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/6.jpg)
BetrFS
Main idea of this talk
![Page 7: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/7.jpg)
Write-Optimized Data Structures (WODS)
● New class of data structures discovered in 90's– LSM trees [O'Neil, Cheng, Gawlick, & O'Neil '96]
– Bε-trees [Brodal & Fagerberg '03]
– COLAs [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, & Nelson '07]
– xDicts [Brodal, Demaine, Fineman, Iacono, Langerman, & Munro '10]
● WODS perform inserts/updates/deletes orders-of-magnitude-faster than in a B-tree– WODS queries are asymptotically no slower than in a B-tree
BetrFS usesBε-trees
![Page 8: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/8.jpg)
How computation works: •Data is transferred in blocks between RAM and disk. •The number of block transfers dominates the running time.
Goal: Minimize # of block transfers•Performance bounds are parameterized by block size B,
memory size M, data size N.
The Disk-Access Machine (DAM) model [Aggarwal & Vitter '88]
RAM Disk
B
B
M
![Page 9: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/9.jpg)
…. children ….
…......................................................
O(logB N )
Example: B-trees
B
B
B B
B B
Insert cost: O ( logB N ) Lookup cost: O ( logB N )
![Page 10: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/10.jpg)
B…. children ….
O(log√B N )
Bε-trees (ε=1/2)
√B
√B
B−√B
√B
B−√B
√B
B−√B
…......................................................B B
buffer space
Insert cost: O( √BB−√B
log√B N )=O( logB N
√B ) Lookup cost: O(log√B N )=O(logB N )
Range query cost: O(logB N+k /B)
Inserts are orders-of-magnitude faster than in a B-tree
Range queries canrun at disk bandwidth
Point queries are asymptotically as fast
as in a B-tree
![Page 11: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/11.jpg)
11
The search-insert asymmetry
● Inserts are orders-of-magnitude faster than point queries
● But many updates require querying the old value first● e.g. “Add $10 to rob's account balance”
– OldBalance = query(rob)– NewBalance = oldBalance + $10– insert(newBalance)
Point query:
Insert: O( logB N
√B )
O (logB N )
![Page 12: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/12.jpg)
B…. children ….
O(log√B N )
Upserts: read-modify-write as fast as an insert
√B
√B
B−√B
√B
B−√B
√B
B−√B
…......................................................B B
buffer space
rob: $5
rob:add $10
rob: $15
![Page 13: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/13.jpg)
Bε-tree performance summary
Point query
Insert/delete/upsert
Range query
O( logB N
√B )
O ( logB N )
O (logB N+k /B )
Very fast (10K-100K per second)
As fast as a B-tree
To get the best possible performance, we want to do
Inserts, deletes, upserts, and range queries, and avoid point queries.
Near disk bandwidth
![Page 14: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/14.jpg)
• Maintain two separate Bε-tree indexes:
metadata index: full path > struct statdata index: (full path, blk#) > data[4096]
• Implications: Fast directory scansData blocks are laid out sequentially
The BetrFS schema (version 0.1)
![Page 15: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/15.jpg)
Mapping file-system operations to key-value operations
read
write
metadata update
readdir
mkdir/rmdir
unlink
rename
range query
insert/upsert
upsert
range query
insert/delete
*delete each block
*delete then reinsert each block
Operation Implementation
Fast atime updates
Fast directorytraversals
Do not map ontosingle key-value
operations
![Page 16: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/16.jpg)
0.1
1
10
100
*lower is better
Time (
s) BetrFSbtrfsext4xfszfs
1000 Random 4−byte writes
Small, random, unaligned writes are an order-of-magnitude faster
●1 GiB file, random data
●1,000 random 4-byte writes
●fsync() at endLog scale
0.17sec vs > 10sec
BetrFS random writes
benefit from Bε-treeinsertion performance
![Page 17: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/17.jpg)
100
1000
10000
100000
0 1M 2M 3MFiles Created
*higher is better
Files
/seco
nd
BetrFSbtrfsext4xfszfs
Small File Creation
Small file creates are an order-of-magnitude faster
●Create 3 million files and write
200-bytes to each
●Balanced directory tree with
fanout 128
Log scale
BetrFS file creates
benefit from Bε-treeinsertion performance
![Page 18: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/18.jpg)
Sequential I/O
0
25
50
75
100
read writeOperation
*higher is better
MiB/s
BetrFSbtrfsext4xfszfs
1GiB Sequential I/O• Write random data to file, 10
4K-blocks at a time
• Sequentially read data back
BetrFS sequential reads benefit from Bε-tree range
query performance
Mostly overhead of full-data journaling
(we'll fix this later in the talk)
![Page 19: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/19.jpg)
0
20
40
60
80
Time
(s)
BetrFSbtrfsext4xfszfs
grep −r
0
5
10
15
20
Time (
s)
GNU Find
Recursive directory traversals
• Recursive scans from root of
Linux 3.11.10 source
• GNU find scans file metadata
• grep –r scans file contents
BetrFS directory traversals
benefit from Bε-treerange-query performance
Lower is better
About 3-8x fasterthan other file systems
![Page 20: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/20.jpg)
File deletion
0
100
200
300
File Size
Tim
e (s
)
BetrFS
BetrFS Delete Scaling
• Write random data to file,
fsync() it
• Delete file
BetrFS deletes require O(n) key-value
operations
Lower is better
![Page 21: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/21.jpg)
Directory rename
Lower is better
BetrFS renames require O(n) key-value
operations
Renames areorders-of-magnitude
slower than in ext4
![Page 22: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/22.jpg)
• Sequential reads• Sequential writes• Random writes• File/directory renames• File deletes• Recursive scans• Metadata updates
BetrFS (version 0.1) performance summary
Let's fix these problems
![Page 23: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/23.jpg)
Accelerating rename without slowing down directory traversals
![Page 24: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/24.jpg)
Full-path indexing yields fast directory scans
Example: grep -r “key” /home/rob/doc/
Disk (physical)Directory Tree (logical)
/home/rob/doc/home/rob/doc/latex/home/rob/doc/latex/a.tex/home/rob/doc/latex/b.tex/home/rob/doc/bar.c/home/rob/local
….
….
….
home
roblocal
2.jpg
videodoc
late
x 1.mp4
a.te
x b.texbar.c
disk head
![Page 25: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/25.jpg)
home
roblocal
2.jpg
videodoc
late
x 1.mp4
a.te
x b.texbar.c
Rename is expensive when using full-path indexing
/home/rob/doc
/home/rob/doc/bar.c/home/rob/local
….
….
….
Example: mv /home/rob/doc/latex /home/rob
/home/rob/latex/a.tex/home/rob/latex/b.tex
/home/rob/latex
/home/rob/doc/latex/b.tex/home/rob/doc/latex/a.tex
Disk (physical)Directory Tree (logical)
/home/rob/doc/latex
late
x
![Page 26: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/26.jpg)
The tension between fast rename and fast scanZoning: balancing indirection and locality
Scan cost
Rename cost
Indirection LocalityZones
![Page 27: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/27.jpg)
Implication: Recursive directory scans only perform seeks when
crossing zones
BetrFS v0.2 rethinking the schema
● Partition file system into zones
● Use full-path indexing within zones
● Use inodes between zones
Zone: a subtree of the directory hierarchy
home
roblocal
2.jpg
videodoc
1.mp4la
tex
a.te
x b.texbar.c
Zone 1Zone 2Zone 0
![Page 28: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/28.jpg)
BetrFS v0.2 rethinking the schemaMoving the root of a zone is cheap
home
roblocal
2.jpg
videodoc
1.mp4la
tex
a.te
x b.texbar.c
Zone 1Zone 2
Example:mv /home/rob/video/1.mp4 /home/rob/doc
1.mp4
Zone 0
![Page 29: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/29.jpg)
BetrFS v0.2 rethinking the schema
home
roblocal
2.jpg
videodoc
late
x
a.te
x b.texbar.c
Zone 1Zone 2
Example:mv /home/rob/doc/latex /home/rob/latex
1.mp4
Zone 0
Renaming a subtree of a zone requires copying
late
x
![Page 30: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/30.jpg)
BetrFS v0.2 rethinking the schema
home
roblocal
2.jpg
videodoca.
tex b.tex
bar.c
Zone 1Zone 2
1.mp4
Zone 0
Managing zone sizes
late
x
Large zones → fast directory scansSmall zones → fast renames
We can keep zone sizes in a “sweet spot” by splitting large zones and merging small zones
![Page 31: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/31.jpg)
How big should zones be?
BetrFS-0.2 uses 512KB zones to balance rename and scan performance
Cost of renaming root of a zone
Cost of renaming via copy
![Page 32: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/32.jpg)
BetrFS 0.2 rename performance
21 s
econ
ds
Rename linux source tree
Rename performance iscomparable to other
file systems
![Page 33: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/33.jpg)
Performing sequential writes at disk bandwidth and with full data
journaling semantics
![Page 34: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/34.jpg)
BetrFS version 0.1 writes everything twice
34
(k2, )
(k1, )
root
a b
root’
b’
Logtime
insert(k2, )
v2
v1
v2insert(k1, )v1
![Page 35: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/35.jpg)
BetrFS version 0.2: late-binding journal
35
(k2, )
unbound(k1, )
root
a b
root’
b’
Logtime
insert(k2, )
v2
v1
v2Unbound-insert(k1) . . . bind( , )
![Page 36: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/36.jpg)
Why don't we use late-binding for small writes?
● Reason 1:
– Late-binding requires writing out a large (e.g. 4MB) node
– For small writes, this is huge write-amplification
– It's more efficient to make small writes durable by simply logging them● Reason 2:
– Small, random writes get written to disk several times as they get flushed down the Bε-tree
– So writing them to the log is not a big extra cost
![Page 37: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/37.jpg)
Late-binding journal:performance evaluation
Fast sequential writeswith full data journaling
Sequential write
![Page 38: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/38.jpg)
Rangecast delete
Delete /foo/*
/bar//fo
o/a /foo/x
/goo/a
Garbagecollected
File deletions require inserting a single message
andenable efficient
garbage collection
![Page 39: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/39.jpg)
Rangecast delete performance evaluation
Constant latency,about 30% faster than ext4
File delete
![Page 40: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/40.jpg)
Is BetrFS still fast at other operations?
![Page 41: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/41.jpg)
BetrFS still performsrandom writes orders ofmagnitude faster than
other file systems
BetrFS still has metadataupdates almost 100x
faster than ext4
Zone splits
![Page 42: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/42.jpg)
What about real application performance?
![Page 43: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/43.jpg)
git clone git diff
Macrobenchmark: git
Performance comparable to other file systems Recursive scan performance
pays off in real applications
![Page 44: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/44.jpg)
Macrobenchmark: dovecot imap maildir workload
Payoff of improved delete andrename performance
![Page 45: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/45.jpg)
• Sequential reads• Sequential writes• Random writes• File/directory renames• File deletes• Recursive scans• Metadata updates
BetrFS (version 0.2) performance summary
![Page 46: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/46.jpg)
Conclusion● Write-optimized data structures can enable us to overcome long-standing file-system design trade-offs
● Write-optimized file systems can offer across-the-board top-of-the-line performance
● Write-optimization creates a need/opportunity to revisit many file system design issues
● Code available atbetrfs.org
![Page 47: BetrFS: A Right-Optimized Write-Optimized File System · Point queries are asymptotically as fast as in a B-tree. 11 The search-insert asymmetry ... 1000 10000 100000 0 1M 2M 3M Files](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fc03cdd13c8494df2077b58/html5/thumbnails/47.jpg)
SSD performance preview
Still work to do
6x speedup