lecture 7 ecen 5653 -...
TRANSCRIPT
Sam Siewert 2
Hardware View of Device
Interfaces Analog I/O
– DAC analog output: servos, motors, heaters, ...
– ADC analog input: photodiodes, thermistors, ...
Digital I/O
– Direct TTL I/O or GPIO
– Digital Serial (I2C, SPI, ... - Chip-to-Chip)
– Bus Interfaces Parallel
– PCI 2.x, PCI-X, SCSI, etc (32-bit, 64-bit, synchronous parallel transfer)
Differential Serial – USB
– Infiniband
– gigE / 10GE Ethernet
– Fiber Channel
– SAS/SATA
Sam Siewert 3
Software View of Drivers Character – Register Control/Config, Status, Data
– Typical of Low-Rate I/O Interfaces (RS232)
– Linux User Space Buffer Drivers (Direct IO) – e.g. SCSI Generic
Block – FIFOs, Dual-Port RAM and DMA
– Typical of High-Rate I/O Interfaces (Network, Storage)
– Only Interface for 512 Byte LBA/Sector HDDs
Network – Driver Stacks
– OSI 7 Layer Model (Phy, Link, Network, Transport, Session, Presentation, Application)
– TCP/IP/Ethernet/Cat-6e
Sam Siewert 4
Linux Char Driver Design Application Interface
– Application Policy
– Blocking/Non-Blocking
– Multi-thread access
– Abstraction
Device Interface
– SW/HW Interface
– Immediate Buffering
– Interrupt Service
Routine
App/Device Interface Hardware Device
Application(s)
ISR
SemGive Input
Ring-Buffer
Output
Ring-Buffer
If Output
Ring-Buffer Full then
{SemTake
or EAGAIN}
else
{Process and Return}
If Input
Ring-Buffer Empty then
{SemTake
or EAGAIN}
else
{Processand Return}
open/close, read/write,
creat, ioctl EAGAIN, Block,
Data, Status
Sam Siewert 5
Cached Memory and DMA Cache Coherency – Making sure that cached data and memory are in sync
– Can become out of sync due to DMAs and Multi-Processor Caches
– Push Caches Allow for DMA into and out of Cache Directly
– Cache Snooping by HW may Obviate Need for Invalidate
Drivers Must Ensure Cache Coherency – Invalidate Memory Locations on DMA Read Completion
– Flush Cache Prior to DMA Write Initiation
IO Data Cache Line Alignment – Ensure that IO Data is Aligned on Cache Line Boundaries
– Other Data That Shares Cache Line with IO Data Could Otherwise Be Errantly Invalidated
Sam Siewert 6
Advantages of Abstracted
Driver Portability
– If SW/HW Interface changes, change BH
– If Application Interface changes, change TH
Testability (Test BH and TH Separately)
Single Point of Access and Maintenance
Enforce Multi-Thread Usage Policies
Separate Buffering and ISR from Usage
Common Application Entry Points
Scheduled I/O (Most Work in Task Context rather
than ISR Context)
Sam Siewert 7
Linux Driver Writer Resources
“Linux Device Drivers – 3rd Ed.”, by J.
Corbet, A. Rubini, G. Kroah-Hartman,
2005, (0-596-00590-3), O’Reilly, publisher
link, E-book link
"PCI System Architecture", Tom Shanley
and Don Anderson, 4th Edition, 1999,
(ISBN 0-201-30974-2) MindShare, Inc., E-
book link, publisher link, retailer link,
library link.
Current and Detailed Linux
Driver Developer References Jerry Cooperstein’s Linux Developer
Books - http://www.coopj.com/
Cooperstein’s Solutions -
http://www.coopj.com/LDD/
http://ecee.colorado.edu/~ecen5033/ecen5
033/code/ssd/
Sam Siewert 8
Digital Media Filesystems
Three Types of Media Storage – Direct Attached Storage – e.g. SATA (Serial ATA)
– Network Attached Storage – e.g. NFS
– Storage Area Networks – e.g. SAS (Serial Attached SCSI), Fiber Channel
Flash / RAM based SSD Still 10x++ More Costly than Spinning Media – Predictions for Demise of HDDs and RAID?
– Cost is the Driver
Fast Storage is Either SSD, RAID or Hybrid
Sam Siewert 9
RAID Operates on
LBAs/Sectors (Sometimes Files) SAN/DAS RAID
NAS – Filesystem on top of RAID
RAID-10, RAID-50, RAID-60 – Stripe Over Mirror Sets
– Stripe Over RAID-5 XOR Parity Sets
– Stripe Over RAID-6 Reed-Soloman or Double-Parity Encoded Sets
EVEN/ODD
Row Diagonal Parity
Minimum Density Codes (Liberation)
Reed-Solomon Codes
– Generalized Erasure Codes Cauchy Reed-Solomon, LDPC (Low Density Parity Codes), Weaver/Hover
MDS (Maximal Distance Seperation) – For each Parity Device, Another Level of Fault Tolerance is Provided
– Larger Drives (Multi-terabyte), Larger arrays (100’s of drives), and Cost Reduction are Driving RAID6 and Higher Levels
Sam Siewert 10
RAID-10
Sam Siewert 11
A1 A1 A2 A2 A3 A3
A4 A4 A5 A5 A6 A6
RAID-1 Mirror RAID-1 Mirror RAID-1 Mirror
RAID-0 Striping Over RAID-1 Mirrors
A7 A7 A8 A8 A9 A9
A10 A10 A11 A11 A12 A12
A1,A2,A3, … A12
RAID5,6 XOR Parity Encoding
MDS Encoding, Can Achieve High
Storage Efficiency with N+1: N/(N+1) and
N+2: N/(N+2)
Sam Siewert 12
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sto
rag
e E
ffic
ien
cy
Number of Data Devices for 1 XOR or 2 P,Q Encoded Devices
RAID6
RAID5
RAID-50
Sam Siewert 13
A1
RAID-5 Set RAID-5 Set
B1 C1 D1 P(ABCD)
E1 F1 G1 H1 P(EFGH)
I1 J1 P(IJKL) K1 L1
M1 P(MNOP) N1 P1 O1
P(QRST) Q1 R1 S1 T1
A2 B2 C2 D2 P(ABCD)
E2 F2 G2 H2 P(EFGH)
I2 J2 P(IJKL) K2 L2
M2 P(MNOP) N2 P2 O2
P(QRST) Q2 R2 S2 T2
RAID-0 Striping Over RAID-5 Sets
A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…,
Q2,R2,S2,T2
A1
RAID-6 Set RAID-6 Set
B1 C1 D1 P(ABCD)
E1 F1 G1 P(EFGH)
I1 J1 P(IJKL) K1
M1 P(MNOP) N1 O1 P(QRST) Q1 R1 S1
RAID-0 Striping Over RAID-6 Sets
A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…, Q2,R2,S2,T2
Disk5 Disk1 Disk2 Disk3 Disk4
Q(EFGH)
Disk6
H1 QABCD)
Q(IJKL)
Q(MNOP)
Q(QRST)
L1 P1
T1
A2 B2 C2 D2 P(ABCD)
E2 F2 G2 P(EFGH)
I2 J2 P(IJKL) K2
M2 P(MNOP) N2 O2 P(QRST) Q2 R2 S2
Disk5 Disk1 Disk2 Disk3 Disk4
Q(EFGH)
Disk6
H2 QABCD)
Q(IJKL)
Q(MNOP)
Q(QRST)
L2 P2
T2
RAID-60 (Reed-Solomon
Encoding)
Comparison of ECs Data Devices = n
Coding Devices = m
Total = m+n
Storage Efficiency: R=n/(n+m) – RAID1 2-Way, R=1/(1+1)=50%, MDS=1, Reads 2x Speed-up, 1x
Write
– RAID1 3-Way, R=1/(1+2)=33%, MDS=2, 3x Read, 1x Write
– RAID10 with 10 sets, R=10/(10+10)=50%, MDS=1, 20x Read, 10x Write
– RAID5 with 3+1 set, R=3/(3+1)=75%, MDS=1, 3x Read (Parity Check?), RMW Penalty, Striding Issues
– RAID6 with 7+2 set, R=5/(5+2)=71%, MDS=2, 5x Read, Reed-Solomon Encode on Write and RMW Penalty
– Beyond RAID6? Cauchy Reed-Solomon Scales, but Encode, Decode Complexity High
Low Density Parity Codes, Simpler, but not MDS
Sam Siewert 17
Read, Modify Write Penalty Any Update that is Less than the Full RAID5 or RAID6 Set, Requires 1. Read Old Data and Parity – 2 Reads
2. Compute New Parity (From Old & New Data)
3. Write New Parity and New Data – 2 Writes
Only Way to Remove Penalty is a Write-Back Cache to Coalesce Updates and Perform Full-Set Writes Always
Sam Siewert 18
A1
RAID-5 Set
B1 C1 D1 P(ABCD)
E1 F1 G1 H1 P(EFGH)
I1 J1 P(IJKL) K1 L1
M1 P(MNOP) N1 P1 O1
P(QRST) Q1 R1 S1 T1
Write A1 P(ABCD)new=A1new xor A1
xor P(ABCD)
A1 B1 C1 D1 P(ABCD)
0 0 0 0 0
0 0 0 1 1
0 0 1 0 1
0 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
…
Conclusion
IDF Paper on traditional RAID vs EC - http://ecee.colorado.edu/~ecen5033/ecen5033/papers/SF11_STOS004_101F.pdf
Deeper Dive Into Erasure Codes (James Plank FAST Presentation)
Lab 3 Discussion – http://ecee.colorado.edu/~ecen5033/ecen5033/la
bs/lab3-hints.html
– Using RAM Disk to Explore MDADM
Linux RAID Demos
Driver Discussion
Sam Siewert 19