ext2c: increasing disk reliability brian pellin, chloe schulze cs736 presentation may 3 th, 2005

EXT2C: Increasing Disk Reliability

Brian Pellin, Chloe SchulzeCS736 Presentation

May 3th, 2005

Introduction

Problem: Disks can fail silently, corrupting data

(Partial) Solution: Checksum the data to verify correctness before returning data to the user

(This does not recover lost data, but at least the user knows the data is bad.)

Approach Implement checksumming within

EXT2 New read/write functions Wrap new functionality around

existing functions New error code

Conclusion Summary Implemented working

checksumming on top of EXT2 Achieves added safety at the cost

of additional overheads

Outline Fault Model EXT2C Performance Conclusions

Fault Model Fail Stop – all or nothing

No longer adequate for today’s disk failures

Partial Failure Latent sector errors Misdirection – right data written to wrong

location Phantom writes – disk returns okay, but data

was not written Malicious writes

Relating Fault Model to Implementation Partial Failure suggests:

Detection Notification Verification of data Backup or replication to avoid data

loss

EXT2C EXT2 base file system Modifications

Checksum file data New read/write functions to

implement the checksumming New error code to notify user of data

corruption

Checksums Checksum computed per block of a

file One checksum file per inode Named for file’s inode number 20 bytes long (fixed length) Computed by the hash function

SHA-1

Checksum Creation

inode no. 5

Block no. 1

…

SHA-1

foo.c 5 (Checksum file)

…

0

20

0

4096

Ext2C_file_read Normal file read Open checksum file Calculate blocks read For each block being read

Read in block Compute checksum Read in old checksum Compare – if not equal, return error

Close checksum file Return result

Read Operation

0 4096 8192

FileData Blocks

2000 6000

Read 4000 bytes

1. Read Data From File

2. Read overlapping data block

3. Read corresponding section of checksum file

4. Hash Data and Compare with stored checksum

5. Repeat for other blocks overlapped by readChecksum

File

Hash

?=Match or Failure

Ext2c_file_write Normal file write Calculate blocks changed Open checksum file For each changed block

Read in block Compute checksum Write checksum to checksum file

Close checksum file Return result of normal file write

Targeted Problems Detect

Silently corrupted data Partially Detect

Phantom writes Misdirection Malicious write

Correctness Able to run PostMark and

additional benchmarks without encountering any errors

Injected errors are detected and our error code is returned

EXT2 vs. EXT2C Test Outline Microbenchmarks

Measure cold cache small reads/writes

Warm Cache small reads Benchmarks capturing larger scale

behavior PostMark Large sequential reads

Cold cache read/write Comparison

We time the differences between ext2 and ext2c on: Single block reads Single block writes 10 block reads 10 block writes

Overhead Costs EXT2 vs. EXT2C

0

2

4

6

8

10

12

Read 1 Block Write 1 Block Read 10Blocks

Write 10Blocks

Mill

isec

on

ds

EXT2

EXT2C

Warm cache comparison What overhead does ext2c add

when data is cached in memory?

EXT2 vs. EXT2C - Reads Cached

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Read 1 Block - Cached Read 10 Blocks - Cached

Mill

isec

on

ds

EXT2

EXT2C

EXT2 vs. EXT2C Test Outline Microbenchmarks

Measure cold cache small reads/writes

Warm Cache small reads Benchmarks capturing larger scale

behavior PostMark Large sequential reads

PostMark Benchmark crafted to simulate

realistic small file workloads Intersperses read/write/append

operations Measures throughput (transactions

per second)

PostMark Results (Transactions per second)

EXT2 EXT2C

Total Transactions

5000 2500

Create 500 500

Read 2499 1249

Append 2483 1241

Delete 628 628

Large Sequential Reads Desire:

Check summing costs will be amortized over long operations

Sequential Reads

0

5

10

15

20

25

30

35

40

0 100 200 300 400 500

Blocks (4096 bytes)

Mill

isec

on

ds

EXT2

EXT2C

Conclusions Benefit: notification of data

corruption, no longer mistake bad or wrong data for good data

Cost: overhead of checksum computation and extra I/O costs Throughput is halved on small file

workloads Sequential I/O amortizes some overhead

Further Work Optimizations

Open/close the checksum when the file is opened and closed

Batch checksum creation at time of file system creation

Ensuring that checksum data blocks are near file blocks to reduce seeking

References/InfluencesDesAutels, P. “SHA 1: Secure Hash Algorithm.”

www.w3.org/PICS/DSig/SHA1_1_0.html 1997

Patil, S., Kashyap, A., Sivanthanu, G., Zadok, E. “I3FS: An In-Kernel Integrity Checker and Intrusion Detection File System” http://www.fsl.cs.sunysb.edu/docs/i3fs/i3fs.html 2005.

Prabhakaran, V., Agrawal, N., Bairavasundaram, L., Gunawi, H., Arpaci-Dusseau, A., Arpaci-Dusseau, R. “IRON File Systems.” Draft 2005

Sivanthanu, G., Wright, C., Zadok, E. “Enhancing File System Integrity Through Checksums.” Technical Report FSL-04-04 2004.

Stein, C., Howard, J., Seltzer, M. “Unifying File System Protection, Proceedings of the 2001 USENIX Annual Technical Conference” 2001

Weinberg, G. “Solaris Dynamic File System.” Sun Microsystems Presentation

ext2c: increasing disk reliability brian pellin, chloe schulze cs736 presentation may 3 th, 2005

Documents