erofs: a compression-friendly readonly file system for
TRANSCRIPT
EROFS: A Compression-friendly ReadonlyFile System for Resource-scarce Devices
Xiang Gao#, Mingkai Dong*, Xie Miao#,Wei Du#, Chao Yu#, Haibo Chen*,#
#Huawei Technologies Co., Ltd.*IPADS, Shanghai Jiao Tong University
USENIX ATC 2019, Renton, WA, USA
2
512 654 650 840
26543072
2.3.6 4.3 5.1 6.0 7.0 8.0Android Version
/system Partition Size (MB)
System resources in Android
...
System resources consume significant storage!Read-only system partitions → compressed read-only file systems
/system
Read-only /oem
Read-only
/odm
Read-only
/vendor
Read-only
3
Existing solutionSquashfs is the state-of-the-art compressed read-only file systemWe tried to use squashfs for system resources in Android
/systemSquashfs
/oemSquashfs
/vendorSquashfs
/odmSquashfs
Result:The system lagged and even froze for seconds and then rebooted
4
Why does Squashfs fail?
6
Fixed-sized input compression
1. Divide to fixed-sized chunks
2. Compress each chunk
3. Concatenate
Fixed-sized input compression
Why does Squashfs fail?
1. Divide to fixed-sized chunks
2. Compress each chunk
3. Concatenate
7
Read Amplification
... ...
1 byte
decompressionamplification
I/Oamplification
Why does Squashfs fail?
Fixed-sized input compression
8
Read Amplification
...
buffer_head
...
decompress
Page cache
... copy
1 byte
Massive Memory Consumption
decompressionamplification
I/Oamplification
Why does Squashfs fail?
allocation
allocation
allocation
copy
9
EROFS
10
1. Prepare a large amount of data
2. Compress to a fixed-sized block
3. Repeat
Fixed-sized output compression
EROFS
1. Prepare a large amount of data
2. Compress to a fixed-sized block
3. Repeat
Fixed-sized output compression
ü Reduce read amplification
ü Better compression ratio
ü Allows in-place decompression
12
Choosing the page for I/O
Cached I/OFor partially decompressed blocksAllocate a page in dedicated page cache for I/Oü Following decompression can reuse the cached page
In-place I/OFor blocks to be fully decompressedReuse the page allocated by VFS if possibleü Memory allocation and consumption are reduced
Page cache
Page cache for partiallycompressed blocks
Page cache
Page cache of requested file
13
Decompression
0 1 2 3 4 5 6 7
Page cache
14
Decompression
0 1 2 3 4 5 6 7
Page cache
How data is compressed
Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.
15
Decompression
0 1 2 3 4 5 6 7
Page cache
How data is compressed
Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.
16
Decompression
0 1 2 3 4 5 6 7
Page cache
How data is compressed
same physical pages
Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.
17
Decompression
0 1 2 3 4 5 6 7
Page cache
How data is compressed
same physical pages
Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.
18
Decompression
0 1 2 3 4 5 6 7
Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.
Page cache
0 1 2 3 4
How data is compressed
same physical pages
19
Decompression
Vmap decompression
✓ For all cases✘ Frequent vmap/vunmap✘ Unbounded physical page allocations✘ Data copy for in-place I/O
20
Decompression
Vmap decompression
✓ For all cases✘ Frequent vmap/vunmap✘ Unbounded physical page allocations✘ Data copy for in-place I/O
Buffer decompressionPre-allocate four-page per-CPU buffers✘ For decompression <4 pages✓ No vmap/vunmap✓ No physical page allocation✓ No data copy for in-place I/O
21
DecompressionVmap decompression
✓ For all cases✘ Frequent vmap/vunmap✘ Unbounded physical page allocations✘ Data copy for in-place I/O
Buffer decompressionPre-allocate four-page per-CPU buffers✘ For decompression <4 pages✓ No vmap/vunmap✓ No physical page allocation✓ No data copy for in-place I/O
22
In-place decompression✓ No data copy for in-place I/O
Ø Decompression policy
Ø Optimizations
Details in the paperRolling decompression✓ For decompression < a pre-allocated
VM area size✓ No vmap/vunmap✓ No physical page allocation✘ Data copy for in-place I/O
Evaluation SetupPlatform CPU DRAM StorageHiKey 960 Kirin 960 (Cortex-A73 x 4 + Cortex-A53 x 4) 3 GB 32 GB UFS
Low-end smartphone MT6765 (Cortex-A53 x 8) 2 GB 32 GB eMMC
High-end smartphone Kirin 980 (Cortex-A76 x 4 + Cortex-A55 x 4) 6 GB 64 GB UFS
Micro-benchmarks• Platform: HiKey 960• Tool: FIO• Workload: enwik9, silesia.tar
Application benchmarks• Platform: smartphones• Workload: 13 popular applications
Evaluated file systemsEROFS: LZ4, 4KB-sized outputSquashfs: LZ4, {4,8,16,128}KB chunk sizeBtrfs: LZO, 128KB chunk size
readonly mode w/o integrity checksExt4: no compressionF2FS: no compression
23
More results in the paper
Micro-benchmark: FIO Throughput
0
50
100
150
200
250
300
Sequential Random Stride
Thro
ughp
ut(M
B/s)
EROFSSquashfs-4KSquashfs-8KSquashfs-16KSquashfs-128KExt4F2FSBtrfs
FIO, enwik9, HiKey 960 A73 2362 MHz
24
Micro-benchmark: FIO Throughput
0
50
100
150
200
250
300
Sequential Random Stride
Thro
ughp
ut(M
B/s)
EROFSSquashfs-4KSquashfs-8KSquashfs-16KSquashfs-128KExt4F2FSBtrfs
FIO, enwik9, HiKey 960 A73 2362 MHz
25Btrfs performs worst in all cases.
Larger chunk size brings better performance for Squashfs,if the cached results can are used.
Micro-benchmark: FIO Throughput
0
50
100
150
200
250
300
Sequential Random Stride
Thro
ughp
ut(M
B/s)
EROFSSquashfs-4KSquashfs-8KSquashfs-16KSquashfs-128KExt4F2FSBtrfs
FIO, enwik9, HiKey 960 A73 2362 MHz
26
Micro-benchmark: FIO Throughput
0
50
100
150
200
250
300
Sequential Random Stride
Thro
ughp
ut(M
B/s)
EROFSSquashfs-4KSquashfs-8KSquashfs-16KSquashfs-128KExt4F2FSBtrfs
FIO, enwik9, HiKey 960 A73 2362 MHz
27EROFS performs comparable or even better than Ext4.
Read Amplification and Resource Consumption
IO (MB) Consumption (MB)Seq. Random Stride Storage Memory
Requested/Ext4 16 16 16 953.67 988.51Squashfs-4K 10.65 26.19 26.23 592.43 1597.50Squashfs-8K 9.82 33.52 34.08 530.43 1534.09Squashfs-16K 9.05 46.42 48.32 479.38 1481.12Squashfs-128K 7.25 165.27 203.91 379.76 1379.84EROFS 10.14 26.12 25.93 533.67 1036.88
28
Read Amplification and Resource Consumption
84% 87%
IO (MB) Consumption (MB)Seq. Random Stride Storage Memory
Requested/Ext4 16 16 16 953.67 988.51Squashfs-4K 10.65 26.19 26.23 592.43 1597.50Squashfs-8K 9.82 33.52 34.08 530.43 1534.09Squashfs-16K 9.05 46.42 48.32 479.38 1481.12Squashfs-128K 7.25 165.27 203.91 379.76 1379.84EROFS 10.14 26.12 25.93 533.67 1036.88
29
Read Amplification and Resource Consumption
84% 87% 92%44%
IO (MB) Consumption (MB)Seq. Random Stride Storage Memory
Requested/Ext4 16 16 16 953.67 988.51Squashfs-4K 10.65 26.19 26.23 592.43 1597.50Squashfs-8K 9.82 33.52 34.08 530.43 1534.09Squashfs-16K 9.05 46.42 48.32 479.38 1481.12Squashfs-128K 7.25 165.27 203.91 379.76 1379.84EROFS 10.14 26.12 25.93 533.67 1036.88
30
Real-world Application Boot Time
97% 87% 95% 104% 92% 104%104% 97% 110%104%89% 90% 85%
0%25%50%75%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13
95% 86% 89% 81%93% 89% 85%
101%77%
95%81%
99%87%
0%25%50%75%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13
Boot
Tim
e (R
elat
ive
to E
xt4)
Low-end Smartphone
High-end Smartphone
3.2%
10.9%
31
Deployment
üDeployed in HUAWEI EMUI 9.1 as a top featureüUpstreamed to Linux 4.19üSystem storage consumption decreased >30% üPerformance comparable or even better than Ext4üRunning on 10,000,000+ smartphones
32
ConclusionEROFS: an Enhanced Read-Only File System with compression supportFixed-sized output compression with four decompression approaches:
Vmap decompressionBuffer decompressionRolling decompressionIn-place decompression
Running on 10,000,000+ smartphonesüReduce system storage consumption >30%üProvide comparable or even better performance than Ext4
Thank you & questions?
EROFSEROFS
EROFS
33
34
Backup Slides
35
Throughput and space savings
0100200300400500600700800
0 6 15 24 34 47 52 65 74 85 90 96
Thro
ughp
ut(M
B/s)
Space savings (%)
EROFS-Seq. Ext4-Seq.
EROFS-Rand. Ext4-Rand.
Computation > I/O I/O > ComputationLess I/O => Better Throughput
FIO, customized workload generated from enwik9, HiKey 960 A73 2362 MHz 36
Rolling decompressionPre-allocate a large VM area and 17 physical pages per CPUUse 17 physical pages in turn.
✓ For decompression < the VM area size✓ No vmap/vunmap✓ No physical page allocation✘ Data copy for in-place I/O
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2
DecompressionObservation:In decompression, LZ4 looks backward at most 64KB of decompressed data.
37
In-place decompressionIf no corruption will happen✓ For decompression < the VM area size✓ No vmap/vunmap✓ No physical page allocation✓ No data copy for in-place I/O
Decompression
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2
Seq. A Seq. corrupted
decompressed Seq. A
Before
decompressin-place
After
38