lecture 7 ecen 5653 -...

19
March 6, 2012 Sam Siewert Lecture 7 ECEN 5653 RT Digital Media Block Devices and Filesystems

Upload: dangnhi

Post on 25-Mar-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

March 6, 2012 Sam Siewert

Lecture 7

ECEN 5653

RT Digital Media Block

Devices and Filesystems

Sam Siewert 2

Hardware View of Device

Interfaces Analog I/O

– DAC analog output: servos, motors, heaters, ...

– ADC analog input: photodiodes, thermistors, ...

Digital I/O

– Direct TTL I/O or GPIO

– Digital Serial (I2C, SPI, ... - Chip-to-Chip)

– Bus Interfaces Parallel

– PCI 2.x, PCI-X, SCSI, etc (32-bit, 64-bit, synchronous parallel transfer)

Differential Serial – USB

– Infiniband

– gigE / 10GE Ethernet

– Fiber Channel

– SAS/SATA

Sam Siewert 3

Software View of Drivers Character – Register Control/Config, Status, Data

– Typical of Low-Rate I/O Interfaces (RS232)

– Linux User Space Buffer Drivers (Direct IO) – e.g. SCSI Generic

Block – FIFOs, Dual-Port RAM and DMA

– Typical of High-Rate I/O Interfaces (Network, Storage)

– Only Interface for 512 Byte LBA/Sector HDDs

Network – Driver Stacks

– OSI 7 Layer Model (Phy, Link, Network, Transport, Session, Presentation, Application)

– TCP/IP/Ethernet/Cat-6e

Sam Siewert 4

Linux Char Driver Design Application Interface

– Application Policy

– Blocking/Non-Blocking

– Multi-thread access

– Abstraction

Device Interface

– SW/HW Interface

– Immediate Buffering

– Interrupt Service

Routine

App/Device Interface Hardware Device

Application(s)

ISR

SemGive Input

Ring-Buffer

Output

Ring-Buffer

If Output

Ring-Buffer Full then

{SemTake

or EAGAIN}

else

{Process and Return}

If Input

Ring-Buffer Empty then

{SemTake

or EAGAIN}

else

{Processand Return}

open/close, read/write,

creat, ioctl EAGAIN, Block,

Data, Status

Sam Siewert 5

Cached Memory and DMA Cache Coherency – Making sure that cached data and memory are in sync

– Can become out of sync due to DMAs and Multi-Processor Caches

– Push Caches Allow for DMA into and out of Cache Directly

– Cache Snooping by HW may Obviate Need for Invalidate

Drivers Must Ensure Cache Coherency – Invalidate Memory Locations on DMA Read Completion

– Flush Cache Prior to DMA Write Initiation

IO Data Cache Line Alignment – Ensure that IO Data is Aligned on Cache Line Boundaries

– Other Data That Shares Cache Line with IO Data Could Otherwise Be Errantly Invalidated

Sam Siewert 6

Advantages of Abstracted

Driver Portability

– If SW/HW Interface changes, change BH

– If Application Interface changes, change TH

Testability (Test BH and TH Separately)

Single Point of Access and Maintenance

Enforce Multi-Thread Usage Policies

Separate Buffering and ISR from Usage

Common Application Entry Points

Scheduled I/O (Most Work in Task Context rather

than ISR Context)

Sam Siewert 7

Linux Driver Writer Resources

“Linux Device Drivers – 3rd Ed.”, by J.

Corbet, A. Rubini, G. Kroah-Hartman,

2005, (0-596-00590-3), O’Reilly, publisher

link, E-book link

"PCI System Architecture", Tom Shanley

and Don Anderson, 4th Edition, 1999,

(ISBN 0-201-30974-2) MindShare, Inc., E-

book link, publisher link, retailer link,

library link.

Current and Detailed Linux

Driver Developer References Jerry Cooperstein’s Linux Developer

Books - http://www.coopj.com/

Cooperstein’s Solutions -

http://www.coopj.com/LDD/

http://ecee.colorado.edu/~ecen5033/ecen5

033/code/ssd/

Sam Siewert 8

Digital Media Filesystems

Three Types of Media Storage – Direct Attached Storage – e.g. SATA (Serial ATA)

– Network Attached Storage – e.g. NFS

– Storage Area Networks – e.g. SAS (Serial Attached SCSI), Fiber Channel

Flash / RAM based SSD Still 10x++ More Costly than Spinning Media – Predictions for Demise of HDDs and RAID?

– Cost is the Driver

Fast Storage is Either SSD, RAID or Hybrid

Sam Siewert 9

RAID Operates on

LBAs/Sectors (Sometimes Files) SAN/DAS RAID

NAS – Filesystem on top of RAID

RAID-10, RAID-50, RAID-60 – Stripe Over Mirror Sets

– Stripe Over RAID-5 XOR Parity Sets

– Stripe Over RAID-6 Reed-Soloman or Double-Parity Encoded Sets

EVEN/ODD

Row Diagonal Parity

Minimum Density Codes (Liberation)

Reed-Solomon Codes

– Generalized Erasure Codes Cauchy Reed-Solomon, LDPC (Low Density Parity Codes), Weaver/Hover

MDS (Maximal Distance Seperation) – For each Parity Device, Another Level of Fault Tolerance is Provided

– Larger Drives (Multi-terabyte), Larger arrays (100’s of drives), and Cost Reduction are Driving RAID6 and Higher Levels

Sam Siewert 10

RAID-10

Sam Siewert 11

A1 A1 A2 A2 A3 A3

A4 A4 A5 A5 A6 A6

RAID-1 Mirror RAID-1 Mirror RAID-1 Mirror

RAID-0 Striping Over RAID-1 Mirrors

A7 A7 A8 A8 A9 A9

A10 A10 A11 A11 A12 A12

A1,A2,A3, … A12

RAID5,6 XOR Parity Encoding

MDS Encoding, Can Achieve High

Storage Efficiency with N+1: N/(N+1) and

N+2: N/(N+2)

Sam Siewert 12

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Sto

rag

e E

ffic

ien

cy

Number of Data Devices for 1 XOR or 2 P,Q Encoded Devices

RAID6

RAID5

RAID-50

Sam Siewert 13

A1

RAID-5 Set RAID-5 Set

B1 C1 D1 P(ABCD)

E1 F1 G1 H1 P(EFGH)

I1 J1 P(IJKL) K1 L1

M1 P(MNOP) N1 P1 O1

P(QRST) Q1 R1 S1 T1

A2 B2 C2 D2 P(ABCD)

E2 F2 G2 H2 P(EFGH)

I2 J2 P(IJKL) K2 L2

M2 P(MNOP) N2 P2 O2

P(QRST) Q2 R2 S2 T2

RAID-0 Striping Over RAID-5 Sets

A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…,

Q2,R2,S2,T2

A1

RAID-6 Set RAID-6 Set

B1 C1 D1 P(ABCD)

E1 F1 G1 P(EFGH)

I1 J1 P(IJKL) K1

M1 P(MNOP) N1 O1 P(QRST) Q1 R1 S1

RAID-0 Striping Over RAID-6 Sets

A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…, Q2,R2,S2,T2

Disk5 Disk1 Disk2 Disk3 Disk4

Q(EFGH)

Disk6

H1 QABCD)

Q(IJKL)

Q(MNOP)

Q(QRST)

L1 P1

T1

A2 B2 C2 D2 P(ABCD)

E2 F2 G2 P(EFGH)

I2 J2 P(IJKL) K2

M2 P(MNOP) N2 O2 P(QRST) Q2 R2 S2

Disk5 Disk1 Disk2 Disk3 Disk4

Q(EFGH)

Disk6

H2 QABCD)

Q(IJKL)

Q(MNOP)

Q(QRST)

L2 P2

T2

RAID-60 (Reed-Solomon

Encoding)

How RAID Relates to

Erasure Codes

Erasure Codes Applied to

Disk or SSD Devices

Sam Siewert

15

RAID is an Erasure Code

RAID-1 is an MDS EC (James Plank, U. of

Tennessee)

Sam Siewert 16

Comparison of ECs Data Devices = n

Coding Devices = m

Total = m+n

Storage Efficiency: R=n/(n+m) – RAID1 2-Way, R=1/(1+1)=50%, MDS=1, Reads 2x Speed-up, 1x

Write

– RAID1 3-Way, R=1/(1+2)=33%, MDS=2, 3x Read, 1x Write

– RAID10 with 10 sets, R=10/(10+10)=50%, MDS=1, 20x Read, 10x Write

– RAID5 with 3+1 set, R=3/(3+1)=75%, MDS=1, 3x Read (Parity Check?), RMW Penalty, Striding Issues

– RAID6 with 7+2 set, R=5/(5+2)=71%, MDS=2, 5x Read, Reed-Solomon Encode on Write and RMW Penalty

– Beyond RAID6? Cauchy Reed-Solomon Scales, but Encode, Decode Complexity High

Low Density Parity Codes, Simpler, but not MDS

Sam Siewert 17

Read, Modify Write Penalty Any Update that is Less than the Full RAID5 or RAID6 Set, Requires 1. Read Old Data and Parity – 2 Reads

2. Compute New Parity (From Old & New Data)

3. Write New Parity and New Data – 2 Writes

Only Way to Remove Penalty is a Write-Back Cache to Coalesce Updates and Perform Full-Set Writes Always

Sam Siewert 18

A1

RAID-5 Set

B1 C1 D1 P(ABCD)

E1 F1 G1 H1 P(EFGH)

I1 J1 P(IJKL) K1 L1

M1 P(MNOP) N1 P1 O1

P(QRST) Q1 R1 S1 T1

Write A1 P(ABCD)new=A1new xor A1

xor P(ABCD)

A1 B1 C1 D1 P(ABCD)

0 0 0 0 0

0 0 0 1 1

0 0 1 0 1

0 0 1 1 0

0 1 0 0 1

0 1 0 1 0

0 1 1 0 0

Conclusion

IDF Paper on traditional RAID vs EC - http://ecee.colorado.edu/~ecen5033/ecen5033/papers/SF11_STOS004_101F.pdf

Deeper Dive Into Erasure Codes (James Plank FAST Presentation)

Lab 3 Discussion – http://ecee.colorado.edu/~ecen5033/ecen5033/la

bs/lab3-hints.html

– Using RAM Disk to Explore MDADM

Linux RAID Demos

Driver Discussion

Sam Siewert 19