se420 software quality assurance
TRANSCRIPT
September 10, 2014 Sam Siewert
SE420Software Quality Assurance
RAID Backgrounder
Scalable Enterprise File systemsThree Types of Media Storage– Direct Attached Storage – e.g. SATA (Serial ATA)– Network Attached Storage – e.g. NFS– Storage Area Networks – e.g. SAS (Serial Attached SCSI), Fiber
Channel
Flash / RAM based SSD Still 10x++ More Costly than Spinning Media– Predictions for Demise of HDDs and RAID?– Cost is the Driver (E.g. < $0.01 / GB tape, < $0.10 / GB HDD,
$1.00 / GB SSD)
Fast Storage is Either SSD, RAID or Hybrid
Sam Siewert 2
Multiple Disk DrivesDisk Drives Fail – Like a Light-bulb– MTBF of 100’s of Thousands of Hours [3 to 5 Years at Duty
Cycle]– Difficult to Determine When Failure Might Occur– The Larger the Population, the More Often Failures will be Seen
Disk Drives Have Low Random Access [100 to 200 I/Osper Second]
Idea – Write to them in Parallel and Mirror Data to Protect Against HDD Failures (Erasures)
Sam Siewert 3
RAID-10
Sam Siewert 4
A1 A1 A2 A2 A3 A3A4 A4 A5 A5 A6 A6
RAID-1 Mirror RAID-1 Mirror RAID-1 Mirror
RAID-0 Striping Over RAID-1 Mirrors
A7 A7 A8 A8 A9 A9A10 A10 A11 A11 A12 A12
A1,A2,A3, … A12
RAID Operates on LBAs/Sectors (Sometimes Files)
SAN/DAS RAIDNAS – Filesystem on top of RAIDRAID-10, RAID-50, RAID-60– Stripe Over Mirror Sets– Stripe Over RAID-5 XOR Parity Sets– Stripe Over RAID-6 Reed-Soloman or Double-Parity Encoded Sets
EVEN/ODDRow Diagonal ParityMinimum Density Codes (Liberation)Reed-Solomon Codes
– Generalized Erasure CodesCauchy Reed-Solomon, LDPC (Low Density Parity Codes), Weaver/HoverMDS (Maximal Distance Separation) – For each Parity Device, Another Level of Fault Tolerance is Provided
– Larger Drives (Multi-terabyte), Larger arrays (100’s of drives), and Cost Reduction are Driving RAID6 and Higher Levels
Sam Siewert 5
RAID5,6 XOR Parity Encoding
MDS Encoding, Can Achieve High Storage Efficiency with N+1: N/(N+1) and N+2: N/(N+2)
Sam Siewert 6
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Stor
age
Effic
ienc
y
Number of Data Devices for 1 XOR or 2 P,Q Encoded Devices
RAID6
RAID5
RAID-50
Sam Siewert 7
A1
RAID-5 Set RAID-5 Set
B1 C1 D1 P(ABCD)
E1 F1 G1 H1P(EFGH)
I1 J1 P(IJKL) K1 L1M1 P(MNOP) N1 P1O1
P(QRST) Q1 R1 S1 T1
A2 B2 C2 D2 P(ABCD)
E2 F2 G2 H2P(EFGH)
I2 J2 P(IJKL) K2 L2M2 P(MNOP) N2 P2O2
P(QRST) Q2 R2 S2 T2
RAID-0 Striping Over RAID-5 Sets
A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…, Q2,R2,S2,T2
A1
RAID-6 Set RAID-6 Set
B1 C1 D1 P(ABCD)
E1 F1 G1 P(EFGH)
I1 J1 P(IJKL) K1M1 P(MNOP) N1 O1P(QRST) Q1 R1 S1
RAID-0 Striping Over RAID-6 Sets
A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…, Q2,R2,S2,T2
Disk5Disk1 Disk2 Disk3 Disk4
Q(EFGH)
Disk6
H1QABCD)
Q(IJKL)
Q(MNOP)
Q(QRST)
L1P1T1
A2 B2 C2 D2 P(ABCD)
E2 F2 G2 P(EFGH)
I2 J2 P(IJKL) K2M2 P(MNOP) N2 O2P(QRST) Q2 R2 S2
Disk5Disk1 Disk2 Disk3 Disk4
Q(EFGH)
Disk6
H2QABCD)
Q(IJKL)
Q(MNOP)
Q(QRST)
L2P2T2
RAID-60 (Reed-Solomon Encoding)
RAID is an Erasure Code
RAID-1 is an MDS EC (James Plank, U. of Tennessee)
Sam Siewert 9
Comparison of ECs
Data Devices = nCoding Devices = mTotal = m+nStorage Efficiency: R=n/(n+m)– RAID1 2-Way, R=1/(1+1)=50%, MDS=1, Reads 2x Speed-up, 1x
Write– RAID1 3-Way, R=1/(1+2)=33%, MDS=2, 3x Read, 1x Write– RAID10 with 10 sets, R=10/(10+10)=50%, MDS=1, 20x Read, 10x
Write– RAID5 with 3+1 set, R=3/(3+1)=75%, MDS=1, 3x Read (Parity
Check?), RMW Penalty, Striding Issues– RAID6 with 7+2 set, R=5/(5+2)=71%, MDS=2, 5x Read, Reed-
Solomon Encode on Write and RMW Penalty– Beyond RAID6?
Cauchy Reed-Solomon Scales, but Encode, Decode Complexity HighLow Density Parity Codes, Simpler, but not MDS
Sam Siewert 10
Read, Modify Write PenaltyUpdate Less than the Full Set, Requires…
1. Read Old Data and Parity – 2 Reads2. New Parity (From Old & New Data)3. Write New Parity and Data – 1 Write + New4. 4 I/O operations to do 1, so penalty is 3
Strategy to Remove Penalty - Write-Back Cache to Coalesce Updates and Perform Full-Set Writes as Often as Possible, e.g., “Write-Anywhere” Tracking at filesystem level – e.g., WAFL
Sam Siewert 11
A1
RAID-5 Set
B1 C1 D1 P(ABCD)
E1 F1 G1 H1P(EFGH)
I1 J1 P(IJKL) K1 L1M1 P(MNOP) N1 P1O1
P(QRST) Q1 R1 S1 T1
Write A1new
Pnew = A1new xor(A1 xor P(ABCD))
A1 B1 C1 D1 P A1new Pnew0 0 0 0 0 1 10 0 0 1 1 1 00 0 1 0 1 1 00 0 1 1 0 1 10 1 0 0 1 1 00 1 0 1 0 1 10 1 1 0 0 1 10 1 1 1 1 1 01 0 0 0 1 0 01 0 0 1 0 0 11 0 1 0 0 0 11 0 1 1 1 0 01 1 0 0 0 0 11 1 0 1 1 0 01 1 1 0 1 0 01 1 1 1 0 0 1
Hands-On Coding Exercise(s)Examples-RAID-Unit-Test, stripetest.c
Sam Siewert 12
A B C D XOR
XOR[A,B,C,D]A,B,C,D Strips
[siewerts@localhost Examples-RAID-Unit-Test]$ ./stripetest Baby-Musk-Ox.ppm Baby-Musk-Ox.ppm.replicatedread full stripe…hit end of fileFINISHED[siewerts@localhost Examples-RAID-Unit-Test]$[siewerts@localhost Examples-RAID-Unit-Test]$ diff Baby-Musk-Ox.ppm Baby-Musk-Ox.ppm.replicated