finding crash-consistency bugs with bounded black-box testing · bounded black-box crash testing...
TRANSCRIPT
![Page 1: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/1.jpg)
Finding Crash-Consistency Bugs with Bounded Black-Box Testing
Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, Vijay Chidambaram
![Page 2: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/2.jpg)
Crashes
This is very important…
File saved! I crashed ☹
File missing ☹
�2
Image source : https://www.fotolia.com
![Page 3: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/3.jpg)
I wish filesystems were crash-consistent!
�3
![Page 4: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/4.jpg)
Rename atomicity bug in btrfs
Memory
Storage
�4
![Page 5: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/5.jpg)
Rename atomicity bug in btrfs
A
Memory
Storage
mkdir (A)
�5
![Page 6: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/6.jpg)
Rename atomicity bug in btrfs
Abar
Memory
Storage
mkdir (A)touch (A/bar)
�6
![Page 7: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/7.jpg)
Rename atomicity bug in btrfs
Abar
Memory
Storage
Abar
mkdir (A)touch (A/bar)fsync (A/bar)
�7
![Page 8: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/8.jpg)
Rename atomicity bug in btrfs
Abar
B
Memory
Storage
Abar
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B)
�8
![Page 9: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/9.jpg)
Rename atomicity bug in btrfs
Abar
B
barMemory
Storage
Abar
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
�9
![Page 10: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/10.jpg)
Rename atomicity bug in btrfs
Abar
B
Memory
Storage
Abar
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
rename (B/bar, A/bar)
�10
![Page 11: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/11.jpg)
Rename atomicity bug in btrfs
Abar
fooB
Memory
Storage
Abar
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
rename (B/bar, A/bar)touch (A/foo)
�11
![Page 12: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/12.jpg)
Rename atomicity bug in btrfs
Abar
fooB
Memory
Storage
Abar
foo
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
rename (B/bar, A/bar)touch (A/foo)fsync (A/foo)
�12
![Page 13: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/13.jpg)
Rename atomicity bug in btrfs
Abar
fooB
Memory
Storage
Abar
foo
Expected
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
rename (B/bar, A/bar)touch (A/foo)fsync (A/foo)
CRASH!
�13
![Page 14: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/14.jpg)
Rename atomicity bug in btrfs
Abar
fooB
Memory
Storage
Abar
foo
Expected
Afoo
Actual
Persisted file A/bar missing
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
rename (B/bar, A/bar)touch (A/foo)fsync (A/foo)
CRASH!
�14
![Page 15: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/15.jpg)
Rename atomicity bug in btrfs
Abar
fooB
Memory
Storage
Abar
foo
Expected
Afoo
Actual
Persisted file A/bar missing
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
rename (B/bar, A/bar)touch (A/foo)fsync (A/foo)
CRASH!
�15
Exists in the kernel since 2014!Found by ACE and CrashMonkey
![Page 16: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/16.jpg)
Testing Crash Consistency Today
• State of the Art : xfstest suite • Collection of 482 regression tests
Only 5% of tests in xfstest check for file system crash consistency�16
• Annotate filesystems • Hard to do for existing FS
Verified Filesystems• Build FS from scratch Model Checking
![Page 17: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/17.jpg)
Challenges with systematic testing
�17
Infinite workload
space
ChallengesLack of automated
infrastructure
Our work addresses both these issues, to provide a systematic testing framework
Systematically generate workloads
![Page 18: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/18.jpg)
Bounded Black-Box Crash Testing (B3)
➡ Focus on reproducible bugs resulting in metadata corruption, data loss. ➡ Found 10 new bugs across btrfs and F2FS; ➡ Found 1 bug in FSCQ (verified file system)➡ Filesystem agnostic – works with any POSIX file system
New approach to testing file-system crash consistency
�18www.github.com/utsaslab/crashmonkey
Target Filesystem
Output: Bug report with workload, expected state, actual state
CrashMonkey
Workload 1 Workload n…
Bounds: (length, operations, args)
Automatic Crash Explorer(ACE)
![Page 19: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/19.jpg)
Outline
• CrashMonkey• Bounded Black Box Crash Testing• Automatic Crash Explorer (ACE)• Demo
�19
![Page 20: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/20.jpg)
Challenges with systematic testing
�20
Infinite workload
space
ChallengesLack of automated
infrastructure
![Page 21: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/21.jpg)
CrashMonkey
�21
• Efficient infrastructure to record and replay block level IO requests
• Simulate crash at different points in the workload• Automatically test for consistency after crash.• Copy-on-write RAM block device
![Page 22: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/22.jpg)
CrashMonkey in Action
�22
Final FS stateInitial FS state
Workload
IO due to workload
Persistence point
![Page 23: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/23.jpg)
�23
CrashMonkey in Action
Initial FS state
Workload
IO due to workload
Persistence point
![Page 24: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/24.jpg)
�24
Phase 1 : Record IO
Initial FS state OracleRecord IO up to persistence point
Safely unmount
Workload
IO due to workload
Persistence point
IO forced by unmount
![Page 25: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/25.jpg)
�25
Phase 2 : Replay IO
Initial FS state Oracle
Initial FS state Crash State
Record IO up to persistence pointSafely unmount
Replay IO up to persistence point
Workload
IO due to workload
Persistence point
IO forced by unmount
![Page 26: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/26.jpg)
�26
Phase 3 : Test for consistency
Initial FS state Oracle
Initial FS state Crash State
Auto Checker
Record IO up to persistence pointSafely unmount
Replay IO up to persistence point
Workload
IO due to workload
Persistence point
IO forced by unmount
After recovery
![Page 27: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/27.jpg)
�27
Initial FS state Oracle
Initial FS state Crash State
Auto Checker
Bug Report
Record IO up to persistence pointSafely unmount
Replay IO up to persistence point
Phase 3 : Test for consistencyWorkload
IO due to workload
Persistence point
IO forced by unmount
![Page 28: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/28.jpg)
Challenges with Systematic Testing
�28
ChallengesLack of automated
infrastructure
Infinite workload
space
So Far…
• Given a workload compliant to POSIX API, we saw how CrashMonkey generates crash states and automatically tests for consistency
CrashMonkey
![Page 29: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/29.jpg)
Challenges with Systematic Testing
�29
So Far…
• Given a workload compliant to POSIX API, we saw how CrashMonkey generates crash states and automatically tests for consistency
• Next question : How to automatically generate workloads in an the infinite workload space?
ChallengesLack of automated
infrastructure
Infinite workload
space
CrashMonkey
![Page 30: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/30.jpg)
Exploring the infinite workload space
Challenges:• Infinite length of workloads• Large set of filesystem operations• Infinite parameter options (file/directory names, depth)• Infinite options for initial filesystem state• When in the workload to simulate a crash?
�30
![Page 31: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/31.jpg)
Outline
• CrashMonkey• Bounded Black Box Crash Testing• Automatic Crash Explorer (ACE)• Demo
�31
![Page 32: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/32.jpg)
B3 : Bounded Black Box Crash Testing
�32
Length of workloads
Initial FS state
Arguments to system calls
![Page 33: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/33.jpg)
B3 : Bounded Black Box Crash Testing
�33
Length of workloads
Initial FS state
Arguments to system calls
![Page 34: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/34.jpg)
B3 : Bounded Black Box Crash Testing
�34
Length of workloads
Initial FS state
Arguments to system calls
Image source: https://en.wikipedia.org/wiki/Cube
![Page 35: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/35.jpg)
B3 : Bounded Black Box Crash Testing
�35
Length of workloads
Initial FS state
Arguments to system calls
![Page 36: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/36.jpg)
B3 : Bounded Black Box Crash Testing
�36
Length of workloads
Initial FS state
Arguments to system calls
![Page 37: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/37.jpg)
B3 : Bounded Black Box Crash Testing
�37
Length of workloads
Initial FS state
Arguments to system calls
![Page 38: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/38.jpg)
B3 : Bounded Black Box Crash TestingChoice of crash point• Only after fsync(), fdatasync() or sync()• Not in the middle of system call
�38
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
rename (B/bar, A/bar)touch (A/foo)fsync (A/foo)
Crash Point 1
Crash Point 2
• Developers are motivated to patch bugs that break semantics of persistence operations
• Crashing in the middle of system calls leads to exponentially large crash-states.
![Page 39: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/39.jpg)
Limitations of B3
• No guarantee of finding all crash-consistency bugs in a filesystem
• Assumes the correct working of crash-consistency mechanism like journaling or CoW• Does not crash in the middle of system calls
• Can only reveal if a bug has occurred, not the reason or origin of bug.
• Needs larger compute to test higher sequence lengths
�39
![Page 40: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/40.jpg)
Outline
• CrashMonkey• Bounded Black Box Crash Testing• Automatic Crash Explorer (ACE)• Demo
�40
![Page 41: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/41.jpg)
Bounds chosen by ACE
�41
Length of workloads
Initial FS state
Arguments to system callsBounds picked based on insights from the study of crash-consistency bugs
reported on Linux file systems over the last 5
years
![Page 42: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/42.jpg)
Bounds chosen by ACE
�42
Length of workloads
Initial FS state
Arguments to system calls
Maximum # core ops is 3
![Page 43: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/43.jpg)
Bounds chosen by ACE
�43
Length of workloads
Initial FS state
Arguments to system calls
Maximum # core ops is 3
RootAB
(foo, bar)(foo, bar)
Overwrites to start, middle, end of a file and append
![Page 44: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/44.jpg)
Bounds chosen by ACE
�44
Length of workloads
Initial FS state
Arguments to system calls
RootAB
(foo, bar)(foo, bar)
Overwrites to start, middle, end and append
Maximum # core ops is 3
New, 100MB FS
![Page 45: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/45.jpg)
Phases of ACE
�45
creat()link()
rename()write()
Operation Set
Generating skeletons of sequence-2. : 4*4 = 16
creat()rename()
creat()link() creat()
write()
creat()creat()
link()link() link()
creat()
link()rename()
link()write()
rename()rename()
rename()creat()
rename()link()
rename()write()
write()write()
write()creat()
write()link()
write()rename()
![Page 46: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/46.jpg)
Phases of ACE
�46
creat()link()
rename()write()
Operation Set
Generating skeletons of sequence-2. : 4*4 = 16
creat()rename()
creat()link() creat()
write()
creat()creat()
link()link() link()
creat()
link()rename()
link()write()
rename()rename()
rename()creat()
rename()link()
rename()write()
write()write()
write()creat()
write()link()
write()rename()
![Page 47: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/47.jpg)
Phases of ACE1. Select Operations
1. creat()2. rename()
�47
A
B
foobar
foobar
File Set
![Page 48: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/48.jpg)
Phases of ACE1. Select Operations
1. creat()2. rename()
2. Select Parameters • If metadata operations, pick
file or directory names • If data operations, pick a
range of offset and length
�48
A
B
foobar
foobar
File Set
![Page 49: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/49.jpg)
Phases of ACE1. Select Operations
1. creat()2. rename()
2. Select Parameters • If metadata operations, pick
file or directory names • If data operations, pick a
range of offset and length1. creat(A/bar)2. rename(B/bar, A/bar)
�49
A
B
foobar
foobar
File Set
![Page 50: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/50.jpg)
Phases of ACE1. Select Operations
1. rename()2. Link()
2. Select Parameters 1. creat(A/bar)2. rename(B/bar, A/bar)
3. Add Persistence
• Between each core operation, add a persistence operation
• Consistency will be checked at these points
• Parameter to the persistence function is again chosen from the file/directory pool
�50
A
B
foobar
foobar
File Set
1. Select Operations 1. creat()2. rename()
![Page 51: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/51.jpg)
Phases of ACE1. Select Operations 2. Select Parameters
1. creat(A/bar)2. rename(B/bar, A/bar)
3. Add Persistence
• Between each core operation, add a persistence operation
• Consistency will be checked at these points
• Parameter to the persistence function is again chosen from the file/directory pool
1. creat(A/bar) fsync(A/bar)2. rename(B/bar, A/bar) fsync(A/foo)
�51
A
B
foobar
foobar
File Set
1. creat()2. rename()
![Page 52: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/52.jpg)
Phases of ACE1. Select Operations 2. Select Parameters
1. creat(A/bar)2. rename(B/bar, A/bar)
3. Add Persistence• Add file create/open/close to
ensure the workload executes on any POSIX compliant filesystem.
4. Add Dependencies
�52
A
B
foobar
foobar
File Set
1. creat()2. rename()
1. creat(A/bar) fsync(A/bar)2. rename(B/bar, A/bar) fsync(A/foo)
![Page 53: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/53.jpg)
Phases of ACE1. Select Operations 2. Select Parameters
1. creat(A/bar)2. rename(B/bar, A/bar)
3. Add Persistence• Add file create/open/close to
ensure the workload executes on any POSIX compliant filesystem.
4. Add Dependencies mkdir(A)1. creat(A/bar) fsync(A/bar) mkdir(B) creat(B/bar)2. rename(B/bar, A/bar) creat(A/foo) fsync(A/foo) close(A/foo) �53
A
B
foobar
foobar
File Set
1. creat()2. rename()
1. creat(A/bar) fsync(A/bar)2. rename(B/bar, A/bar) fsync(A/foo)
![Page 54: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/54.jpg)
Phases of ACE1. Select Operations 2. Select Parameters
3. Add Persistence4. Add Dependencies
This workload with 2 core operations is the same
workload required to trigger rename atomicity bug!
�54
A
B
foobar
foobar
File Set
1. creat()2. rename()
1. creat(A/bar)2. rename(B/bar, A/bar)
1. creat(A/bar) fsync(A/bar)2. rename(B/bar, A/bar) fsync(A/foo)
mkdir(A)1. creat(A/bar) fsync(A/bar) mkdir(B) creat(B/bar)2. rename(B/bar, A/bar) creat(A/foo) fsync(A/foo) close(A/foo)
![Page 55: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/55.jpg)
Challenges with Systematic Testing
�55
ChallengesLack of automated
infrastructure
Infinite workload
space
CrashMonkey ACE
Bounded Black-Box Testing
![Page 56: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/56.jpg)
Results
• Reproduced 24/26 known bugs across ext4, btrfs and F2FS
• Found 10 new bugs across btrfs and F2FS• Found 1 bug in a verified file system, FSCQ
�56
![Page 57: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/57.jpg)
Outline
• CrashMonkey• Bounded Black Box Crash Testing• Automatic Crash Explorer (ACE)• Demo
�57
![Page 58: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/58.jpg)
Testing, specification, and verification
�58
![Page 59: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/59.jpg)
Bounded Black-Box Crash Testing (Poster #4)
Try our tools : https://github.com/utsaslab/crashmonkey�59
• B3 makes exhaustive testing feasible using informed bound selection
• Easily generalizable to test larger workloads if more compute is available
• Found 10 new bugs across btrfs and F2FS, most of which existed since 2014
• Found 1 bug in FSCQ
Thanks!Questions?
![Page 60: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/60.jpg)
Backup slides
�60
![Page 61: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/61.jpg)
Demo
�61
![Page 62: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/62.jpg)
Crash Consistency
• Filesystem operations change multiple blocks on storage that needs to be ordered• Inode, bitmaps, data blocks, superblock• Data and metadata must be consistent on a crash
Metadata Corruption Data Corruption Unmountable FS
Filesystem Unmountable!
�62
![Page 63: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/63.jpg)
What just happened?
Abar
B bar
Abar
B
Rename (B/bar, A/bar)
�63
![Page 64: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/64.jpg)
What just happened?
Abar
B bar
Abar
B
Rename (B/bar, A/bar)
�64
1. unlink (A/bar)
![Page 65: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/65.jpg)
What just happened?
A
B bar
Abar
B
Rename (B/bar, A/bar)
�65
1. unlink (A/bar)2. mv (B/bar, A/bar)
![Page 66: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/66.jpg)
What just happened?
A
B bar
Abar
B
Rename (B/bar, A/bar)
�66
1. unlink (A/bar)2. mv (B/bar, A/bar)
Must have been atomic
mkdir (A)touch (A/bar)fsync (A/bar)
mkdir (B) touch (B/bar)
rename (B/bar, A/bar)
touch (A/foo)fsync (A/foo)
CRASH!
• fsync(A/foo) commits tx that unlinks A/bar• Which means step 1 above is persisted, but rename is not
persisted• End up losing file A/bar• Exists in the kernel since 2014
![Page 67: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/67.jpg)
Study of crash consistency bugs in the wild
• Study the workload pattern and impacts of crash consistency bugs reported in the past 5 years• Kernel mailing lists• Crash consistency tests submitted to xfstests
• 26 unique bugs across ext4, F2FS, and btrfs
�67
![Page 68: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/68.jpg)
Study of crash consistency bugs in the wild
�68
Consequence # bugs
Corruption 17
Data inconsistency 6
Unmountable FS 3
Total 26
Filesystem # bugs
Ext4 2
F2FS 2
btrfs 24
Total 28
# ops # bugs
1 3
2 14
3 9
Total 26
![Page 69: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/69.jpg)
1. Crash consistency bugs are hard to find• Bugs have been around in the kernel for up to 7 years before being
identified and patched• Usually involve reuse of files/ directories
�69
Study of crash consistency bugs in the wildConsequence # bugs
Corruption 17Data inconsistency 6Unmountable FS 3
Total 26
Filesystem # bugsExt4 2F2FS 2btrfs 24Total 28
# ops # bugs1 32 143 9
Total 26
![Page 70: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/70.jpg)
1. Crash consistency bugs are hard to find2. Small workloads are sufficient to reveal bugs
• 2-3 core operations on a new, empty file-system
�70
Study of crash consistency bugs in the wildConsequence # bugs
Corruption 17Data inconsistency 6Unmountable FS 3
Total 26
Filesystem # bugsExt4 2F2FS 2btrfs 24Total 28
# ops # bugs1 32 143 9
Total 26
![Page 71: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/71.jpg)
1. Crash consistency bugs are hard to find2. Small workloads are sufficient to reveal bugs3. Crash after persistence points
• Sufficient to crash after a call to fsync(), fdatasync(), or sync()
�71
Study of crash consistency bugs in the wildConsequence # bugs
Corruption 17Data inconsistency 6Unmountable FS 3
Total 26
Filesystem # bugsExt4 2F2FS 2btrfs 24Total 28
# ops # bugs1 32 143 9
Total 26
![Page 72: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/72.jpg)
1. Crash consistency bugs are hard to find2. Small workloads are sufficient to reveal bugs3. Crash after persistence points4. Systematic testing is required
�72
Study of crash consistency bugs in the wildConsequence # bugs
Corruption 17Data inconsistency 6Unmountable FS 3
Total 26
Filesystem # bugsExt4 2F2FS 2btrfs 24Total 28
# ops # bugs1 32 143 9
Total 26
![Page 73: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/73.jpg)
1. Crash consistency bugs are hard to find2. Small workloads are sufficient to reveal bugs3. Crash after persistence points4. Systematic testing is required
�73
Study of crash consistency bugs in the wildConsequence # bugs
Corruption 17Data inconsistency 6Unmountable FS 3
Total 26
Filesystem # bugsExt4 2F2FS 2btrfs 24Total 28
# ops # bugs1 32 143 9
Total 26
Fallocate : punch_hole : 2015
Fallocate : zero_range : 2018
![Page 74: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/74.jpg)
CrashMonkey Internals
�74
Workload
Filesystem
Generic Block Layer
Device Wrapper
Custom RAM Block Device
Test harness
Crash State 1
Crash State 2User space
Kernel space
• Records write IO requests and barriers (flush/FUA) in the workload• Records special “checkpoint IO” to mark persistence points
in the workload• Fast writeable snapshot capability
![Page 75: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/75.jpg)
CrashMonkey in Action
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
Snapshot
Harness
Start running the workload which would be decomposed by Block Layer as shown. Track files and dir being persisted
fd Path
75
![Page 76: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/76.jpg)
CrashMonkey in Action : Profiling
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
Device wrapper records the block IOs
76
fd Path
13 /a/b
![Page 77: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/77.jpg)
CrashMonkey in Action : Profiling
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
MetadataData Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
Device wrapper records the block IOsand sends down to the CoW RAM device
77
fd Path
13 /a/b
![Page 78: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/78.jpg)
CrashMonkey in Action : Profiling
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
MetadataData Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
MetadataData Flush Checkpoint
Pull the logged IOs
78
fd Path
13 /a/b
![Page 79: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/79.jpg)
CrashMonkey in Action : Profiling
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
MetadataData Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
Logged IOs pulled to the user space
79
fd Path
13 /a/b
![Page 80: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/80.jpg)
CrashMonkey in Action : Profiling
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
MetadataData Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
MetadataData Flush Checkpoint
Oracle
Safely unmount the CoW RAM device to create a test oracle
80
fd Path
13 /a/b
![Page 81: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/81.jpg)
CrashMonkey in Action : Profiling
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
MetadataData Flush Checkpoint
Oracle
81
fd Path
13 /a/b
![Page 82: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/82.jpg)
CrashMonkey in Action : Replay
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
MetadataData Flush Checkpoint
Oracle
Replay the IOs upto Checkpoint
82
fd Path
13 /a/b
![Page 83: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/83.jpg)
CrashMonkey in Action : Replay
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
MetadataData Flush Checkpoint
Oracle
Replay the IOs upto Checkpoint
MetadataData Flush Checkpoint
83
fd Path
13 /a/b
![Page 84: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/84.jpg)
CrashMonkey in Action : Testing
Device Wrapper
Workload
Harness
MetadataData Flush Checkpoint Data Data Metadata Flush Checkpoint
Snapshot
Harness
MetadataData Flush Checkpoint
MetadataData Flush Checkpoint
Oracle
Test consistency for the list of open files – fd=13
MetadataData Flush Checkpoint
84
fd Path
13 /a/b
![Page 85: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/85.jpg)
Testing Strategy to find new bugs• We test seq-1, seq-2 workloads on all filesystems : ext4, xfs,
f2fs, btrfs• We run all other workloads on btrfs and F2FS first.
• For every workload that generated a bug, we run it on all other FS
• To run all workloads upto seq-3, you need to dedicate 2 days of compute per filesystem with (testing in parallel on 780 VM)
�85
![Page 86: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/86.jpg)
Results at a glance
�86
Sequence Length # workloads # Bugs Reproduced # Bugs found
Seq-1
Seq-2
Seq-3 metadata
Seq-3 data
Seq-3 nested
Total
• 25 million workloads• Needs 15 days of testing on 780 VMs in parallel!
![Page 87: Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found](https://reader031.vdocuments.mx/reader031/viewer/2022022117/5c97293809d3f2d8238b8bf9/html5/thumbnails/87.jpg)
Results at a glance
�87
Sequence Length # workloads # Bugs Reproduced # Bugs found
Seq-1 300 3 3
Seq-2 254K 14 3
Seq-3 metadata 120K 5 2
Seq-3 data 1.5M 2 0
Seq-3 nested 1.5M 2 2
Total 3.37M 26 10