high-performance metadata integrity protection in the · pdf filehigh-performance metadata...
TRANSCRIPT
![Page 1: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/1.jpg)
Harendra Kumar, Yuvraj Patel, Ram Kesavan, Sumith Makam
High-Performance Metadata Integrity Protectionin the WAFL Copy-on-Write File System
NetApp, Inc., University of Wisconsin-Madison
![Page 2: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/2.jpg)
Example
2
Customer Data Center
“Freeing free block” panic Support checklist• Start recovery run (fsck
like tool)• Seek Engineering help
root-cause the panic
Recovery Run?
![Page 3: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/3.jpg)
Example
3
Scribble bug or Logic bug?
H/W fault or S/W bug?
India USA
How long the recovery run will take???
Engineering
When corruption happened?
Customer
![Page 4: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/4.jpg)
Summary¡ Bugs keep coming
– Hardware faults– Software bugs
¡ Important to protect metadata for correctness¡ Need of the hour
– Simple techniques for strong data integrity– No/negligible performance impact (deployable)– Diagnostic capability
4
![Page 5: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/5.jpg)
Our Solution¡ Separate solutions for separate problems
– Deployed in production¡ Incremental checksum for scribble bugs¡ Digest-based transaction auditing for logic bugs
– In house¡ Page-level protection for diagnostics
5
![Page 6: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/6.jpg)
Key Results¡ Techniques protect metadata
– Negligible performance impact– More than 3x reduction in recovery runs– Deployed in > 250K systems worldwide
¡ Field data (~5 years)– 33 systems protected from 8 unique scribble bugs– 50 systems protected from 9 unique logic bugs
6
![Page 7: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/7.jpg)
Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion
7
![Page 8: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/8.jpg)
Scribble protection¡ Aim: Avoid scribbles corrupting metadata¡ Rolling Incremental checksum on all metadata
update
8
![Page 9: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/9.jpg)
Incremental checksum example
9
P Q R S T …
TimeIndirect block loaded in memory
P QR’ S T …
Indirect block modified
Incremental checksum = C’
P Q’R’ S T …
Incremental checksum = C’’
Indirect block modified
Just before persisting
• Compute Adler 32 bit checksum of the block = C”
• Compare full checksum & Incremental checksum
On successful verification
RAID/Storage
Persist
Incremental checksumcomputation is dependent on the amount of data modified
and cache-line friendly
Metadata updates• Small in Size• Frequent
Concurrent incrementalchecksum computationpossible without locks
Adler 32 bit checksum of full block = C Incremental checksum initialized to C
![Page 10: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/10.jpg)
10
Incremental checksum example
10
P Q R S T …
TimeIndirect block loaded in memory
P QR’ S T …
Adler 32 bit checksum of full block = C Incremental checksum = C
Indirect block modified
Incremental checksum = C’
P Q’R’ S T …
Incremental checksum = C’
Scribble bug Just before persisting
• Compute Adler 32 bit checksum of the block = C”
• Compare full checksum & Incremental checksum
On verification failure
Panic the system as there can be potential other metadata that is corrupted.
Scribble ends up corrupting the indirect block.(Q à Q’)
Without incremental checksum, this scribble bug can lead to “Freeing free block” panic
![Page 11: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/11.jpg)
Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion
11
![Page 12: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/12.jpg)
Page-level protection¡ Scribble bugs only caught at the end of CP¡ Difficult to root cause scribble bugs¡ Page permissions + Write Protect Enable (WP)
bit– Keep pages read-only by default– Flip WP bit before and after modification
12
![Page 13: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/13.jpg)
Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion
13
![Page 14: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/14.jpg)
Digest-based transaction auditing¡ Logic bugs and their nature¡ Distributed invariants à Consistency equations¡ Lightweight digest (transaction checksum)
– Maintain different digests for different invariants
14
![Page 15: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/15.jpg)
Digest-based transaction auditing
15
Inode
A
XYZB
1 1 0 0 0Bitmap
(A) (B) (C) (D) (E)
B
Client modifies inode A• Adds new block
In-memory state of inode
Inode
A
XYZB
B
PQR
![Page 16: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/16.jpg)
Digest-based transaction auditing
16
Inode
A
XYZB
B
PQR
Inode
D
XYZB
B C
PQR
0 1 1 1 0Bitmap
(A) (B) (C) (D) (E)
During CP
BA
• During indirect block updates• Maintain blocks allocated
digest D1 = C + D• Maintain blocks freed
digest D2 = A
C
Freed block
Allocated block
• During bitmap updates• Maintain blocks allocated
digest D3 = C + D• Maintain blocks freed
digest D4 = AEnd of CP
Compare digests1. D1 == D32. D2 == D4
1 1 0 0 0 Bitmap(A) (B) (C) (D) (E)
![Page 17: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/17.jpg)
Digest-based transaction auditing
17
Inode
A
XYZB
B
PQR
Inode
D
XYZB
B C
PQR
0 1 1 1 0Bitmap
(A) (B) (C) (D) (E)
During CP
BA
• During indirect block updates• Maintain blocks allocated
digest D1 = C + D• Maintain blocks freed
digest D2 = A
C
Freed block
Allocated block
• During bitmap updates• Maintain blocks allocated
digest D3 = C + D• Maintain blocks freed
digest D4 = AEnd of CP
Compare digests1. D1 == D32. D2 == D4
Digests are easy to maintainLightweight - Strong one to one audit avoided
1 1 0 0 0 Bitmap(A) (B) (C) (D) (E)
![Page 18: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/18.jpg)
• During bitmap updates• Maintain blocks allocated
digest D3 = C• Maintain blocks freed
digest D4 = A
D not updated due to race
Digest-based transaction auditing
18
Inode
A
XYZB
B
PQR
Inode
D
XYZB
B C
PQR
0 1 1 0 0Bitmap
(A) (B) (C) (D) (E)
During CP
BA
• During indirect block updates• Maintain blocks allocated
digest D1 = C + D• Maintain blocks freed
digest D2 = A
C
Freed block
Allocated block
End of CPCompare digests
1. D1 != D32. D2 == D4
1 1 0 0 0 Bitmap(A) (B) (C) (D) (E)
![Page 19: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/19.jpg)
• During bitmap updates• Maintain blocks allocated
digest D3 = C• Maintain blocks freed
digest D4 = A
D not updated due to race
Digest-based transaction auditing
19
Inode
A
XYZB
B
PQR
Inode
D
XYZB
B C
PQR
0 1 1 0 0Bitmap
(A) (B) (C) (D) (E)
During CP
BA
• During indirect block updates• Maintain blocks allocated
digest D1 = C + D• Maintain blocks freed
digest D2 = A
C
Freed block
Allocated block
End of CPCompare digests
1. D1 != D32. D2 == D4
Without Digest-based transaction auditing, this race can lead to “Freeing free block” panic
1 1 0 0 0 Bitmap(A) (B) (C) (D) (E)
![Page 20: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/20.jpg)
Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion
20
![Page 21: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/21.jpg)
Evaluation
21
¡ Running on >250K systems for 5+ years¡ Negligible regression on file server benchmarks
(eg. SPEC FS) ¡ Heavy metadata updates by DB workloads
– Database/OLTP benchmark (similar to SPC-1) built in-house
![Page 22: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/22.jpg)
0
7.5
15
22.5
30
80K 88K 96K 104K 112K 120K 128K
ObservedLatency(m
s)
AchievedThroughput(IOPS)
alloff
allon
Performance Evaluation
22
Incremental checksum + Digest-based transaction auditing performance20+ audit equations
• Negligible throughput and latency until 120K ops
• 25% Increase in latency - thereafter
High range - 20 core, 128 GB DRAM, 8 GB NVRAM
![Page 23: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/23.jpg)
Performance evaluation¡ Page level protection
– 20% performance penalty– Used in-house (debug only kernels)– Only used once in field to catch a recurring
scribble bug
23
![Page 24: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/24.jpg)
Protection from corruption bugs¡ 5 year data during in-house development
– Unit test data hard to gather– 75 scribble bugs found by page-level protection– 32 scribble bugs found by incremental checksum– 23 logic bugs found by transaction auditing
¡ More than 3x reduction in no. of recovery runs across ONTAP 8.0 -> 8.3
24
![Page 25: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/25.jpg)
Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion
25
![Page 26: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/26.jpg)
Conclusion¡ Introduced two techniques to enforce data
integrity with minimal performance impact¡ Disprove common belief - “Strong data integrity
requires high performance penalty” ¡ End-to-end protection applicable to databases,
distributed applications¡ Concentrate more on innovation than worrying
about data integrity
26
![Page 27: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡](https://reader031.vdocuments.mx/reader031/viewer/2022030422/5aa9e8d57f8b9a86188d6db2/html5/thumbnails/27.jpg)
Thank you!
Questions???J
27