ext2c: increasing disk reliability brian pellin, chloe schulze cs736 presentation may 3 th, 2005
TRANSCRIPT
EXT2C: Increasing Disk Reliability
Brian Pellin, Chloe SchulzeCS736 Presentation
May 3th, 2005
Introduction
Problem: Disks can fail silently, corrupting data
(Partial) Solution: Checksum the data to verify correctness before returning data to the user
(This does not recover lost data, but at least the user knows the data is bad.)
Approach Implement checksumming within
EXT2 New read/write functions Wrap new functionality around
existing functions New error code
Conclusion Summary Implemented working
checksumming on top of EXT2 Achieves added safety at the cost
of additional overheads
Outline Fault Model EXT2C Performance Conclusions
Fault Model Fail Stop – all or nothing
No longer adequate for today’s disk failures
Partial Failure Latent sector errors Misdirection – right data written to wrong
location Phantom writes – disk returns okay, but data
was not written Malicious writes
Relating Fault Model to Implementation Partial Failure suggests:
Detection Notification Verification of data Backup or replication to avoid data
loss
Outline Fault Model EXT2C Performance Conclusions
EXT2C EXT2 base file system Modifications
Checksum file data New read/write functions to
implement the checksumming New error code to notify user of data
corruption
Checksums Checksum computed per block of a
file One checksum file per inode Named for file’s inode number 20 bytes long (fixed length) Computed by the hash function
SHA-1
Checksum Creation
inode no. 5
Block no. 1
…
SHA-1
foo.c 5 (Checksum file)
…
0
20
0
4096
Ext2C_file_read Normal file read Open checksum file Calculate blocks read For each block being read
Read in block Compute checksum Read in old checksum Compare – if not equal, return error
Close checksum file Return result
Read Operation
0 4096 8192
FileData Blocks
2000 6000
Read 4000 bytes
1. Read Data From File
2. Read overlapping data block
3. Read corresponding section of checksum file
4. Hash Data and Compare with stored checksum
5. Repeat for other blocks overlapped by readChecksum
File
Hash
?=Match or Failure
Ext2c_file_write Normal file write Calculate blocks changed Open checksum file For each changed block
Read in block Compute checksum Write checksum to checksum file
Close checksum file Return result of normal file write
Targeted Problems Detect
Silently corrupted data Partially Detect
Phantom writes Misdirection Malicious write
Outline Fault Model EXT2C Performance Conclusions
Correctness Able to run PostMark and
additional benchmarks without encountering any errors
Injected errors are detected and our error code is returned
EXT2 vs. EXT2C Test Outline Microbenchmarks
Measure cold cache small reads/writes
Warm Cache small reads Benchmarks capturing larger scale
behavior PostMark Large sequential reads
Cold cache read/write Comparison
We time the differences between ext2 and ext2c on: Single block reads Single block writes 10 block reads 10 block writes
Overhead Costs EXT2 vs. EXT2C
0
2
4
6
8
10
12
Read 1 Block Write 1 Block Read 10Blocks
Write 10Blocks
Mill
isec
on
ds
EXT2
EXT2C
Warm cache comparison What overhead does ext2c add
when data is cached in memory?
EXT2 vs. EXT2C - Reads Cached
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Read 1 Block - Cached Read 10 Blocks - Cached
Mill
isec
on
ds
EXT2
EXT2C
EXT2 vs. EXT2C Test Outline Microbenchmarks
Measure cold cache small reads/writes
Warm Cache small reads Benchmarks capturing larger scale
behavior PostMark Large sequential reads
PostMark Benchmark crafted to simulate
realistic small file workloads Intersperses read/write/append
operations Measures throughput (transactions
per second)
PostMark Results (Transactions per second)
EXT2 EXT2C
Total Transactions
5000 2500
Create 500 500
Read 2499 1249
Append 2483 1241
Delete 628 628
Large Sequential Reads Desire:
Check summing costs will be amortized over long operations
Sequential Reads
0
5
10
15
20
25
30
35
40
0 100 200 300 400 500
Blocks (4096 bytes)
Mill
isec
on
ds
EXT2
EXT2C
Outline Fault Model EXT2C Performance Conclusions
Conclusions Benefit: notification of data
corruption, no longer mistake bad or wrong data for good data
Cost: overhead of checksum computation and extra I/O costs Throughput is halved on small file
workloads Sequential I/O amortizes some overhead
Further Work Optimizations
Open/close the checksum when the file is opened and closed
Batch checksum creation at time of file system creation
Ensuring that checksum data blocks are near file blocks to reduce seeking
References/InfluencesDesAutels, P. “SHA 1: Secure Hash Algorithm.”
www.w3.org/PICS/DSig/SHA1_1_0.html 1997
Patil, S., Kashyap, A., Sivanthanu, G., Zadok, E. “I3FS: An In-Kernel Integrity Checker and Intrusion Detection File System” http://www.fsl.cs.sunysb.edu/docs/i3fs/i3fs.html 2005.
Prabhakaran, V., Agrawal, N., Bairavasundaram, L., Gunawi, H., Arpaci-Dusseau, A., Arpaci-Dusseau, R. “IRON File Systems.” Draft 2005
Sivanthanu, G., Wright, C., Zadok, E. “Enhancing File System Integrity Through Checksums.” Technical Report FSL-04-04 2004.
Stein, C., Howard, J., Seltzer, M. “Unifying File System Protection, Proceedings of the 2001 USENIX Annual Technical Conference” 2001
Weinberg, G. “Solaris Dynamic File System.” Sun Microsystems Presentation