lecture 24 disk io and raid

15-447 Computer Architecture Fall 2008 ©

November 12, 2007Nael Abu-Ghazaleh

[email protected]://www.qatar.cmu.edu/~msakr/15447-f08

Lecture 24Disk IO and RAID

CS 15-447: Computer Architecture


Interfacing Processor with peripherals

mainmemory

I/O bridge

bus interface

Front side bus, akasystem bus memory bus

L2 Cache

L1 cachedata

L1 cacheInstrs.

To I/O

Processor


Another view


Disk Access

• Seek: position head over the proper track(5 to 15 ms. avg.)

• Rotate: wait for desired sector(.5 / RPM). RPM 5400—15,000 currently

• Transfer: get the data(30-100Mbytes/sec)

Platter

Track

Platters

Sectors

Tracks


Manufacturing Advantages of Disk Arrays

14”10”5.25”3.5”

3.5”

Disk Array: 1 disk design

Conventional: 4 disk designs

Low End High End

Disk Product Families


RAID: Redundant Array of Inexpensive Disks

• RAID 0: Striping (misnomer: non-redundant)• RAID 1: Mirroring• RAID 2: Striping + Error Correction• RAID 3: Bit striping + Parity Disk• RAID 4: Block striping + Parity Disk• RAID 5: Block striping + Distributed Parity• RAID 6: multiple parity checks


Non-Redundant Array

• Striped: write sequential blocks across disk array

• High performance• Poor reliability:

MTTFArray = MTTFDisk / NMTTFDisk = 50,000 hours (6 years)N = 70 DisksMTTFArray= 700 hours (1 month)

OddBlocks

EvenBlocks


Redundant Arrays of Disks

• Files are "striped" across multiple spindles

• Redundancy yields high data availability• When disks fail, contents are

reconstructed from data redundantly stored in the array

• High reliability comes at a cost:– Reduced storage capacity– Lower performance


RAID 1: Mirroring

• Each disk is fully duplicated onto its “shadow” very high availability

• Bandwidth sacrifice on writes:Logical write = two physical writes

• Reads may be optimized• Most expensive solution: 100%

capacity overhead

Used in high I/O rate , high availability environments


RAID 3: bit striping + parity

• A parity bit for every bit in the striped data

• Parity is relatively easy to compute

• How does it perform for small reads/writes?


Redundant Arrays of Disks RAID 3: Parity Disk

P100100111100110110010011

. . .

logical record 10010011

11001101

10010011

00110000

Striped physicalrecords

• Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time• Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate disk

Targeted for high bandwidth applications: Scientific, Image Processing


RAID 4 (Block interleaved parity)


Redundant Arrays of Disks RAID 5+: High I/O Rate Parity

A logical writebecomes fourphysical I/Os

Independent writespossible because ofinterleaved parity

Reed-SolomonCodes ("Q") forprotection duringreconstruction

A logical writebecomes fourphysical I/Os

Independent writespossible because ofinterleaved parity

Reed-SolomonCodes ("Q") forprotection duringreconstruction

D0 D1 D2 D3 P

D4 D5 D6 P D7

D8 D9 P D10 D11

D12 P D13 D14 D15

P D16 D17 D18 D19

D20 D21 D22 D23 P

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Disk Columns

IncreasingLogical

Disk Addresses

Stripe

StripeUnit

Targeted for mixedapplications


Nested RAID levels

• RAID 01 and 10 combine mirroring and striping– Combine high performance (striping) and

reliability (mirroring)– Get reliability without having to compute

parities: higher performance and less complex controller

• RAID 05 and 50 (also called 53)


Operating System can help (1) Reducing access time

• Disk defragmentation: why does that work?

• Disk scheduling: operating system can reorder requests– How does it work? Reduce seek time

• Example: Mean seek distance first, Elevator algorithm, Typewriter algorithm– Lets do an example

• Log structured file systems


Log structured file systems

• Idea: most reads to disk are serviced from cache – locality!

• But what about writes? they have to go to disk; if system crashes, we the file system is compromised

• How can we make updates perform better:– Save them in a log (sequentially) instead of

their original location; why does that help?– Tricky to manage


Operating System can help (2) Reliability

• RAIDs are reliable to disk failures, not CPU failures/software bugs– If the cpu writes corrupt data to all redundant

disks, what can we do?

• Backups• Reliability in the operating system


How are files allocated on disk?

• Index block, has pointers to the other blocks in the file

• Alternatives: linked allocation

• Data and meta data both stored on disk

• What do we do for bigger files?


Unix Inodes


Disk reliability

• Any update to disk, changes both data and meta data– requires several writes

• Operating system may reorder them as we saw

• What happens if there is a crash?– Lets look at examples

• Solution: journaling file system– Update journal before updating filesystem


Flash Memory

• Emerging technology for non-volatile storage – competitor to hard disks, especially for embedded market– Can be used as cache for the disk (much larger than RAM

disks for the same price, and persistent)

• Floating gate transistors: semi-conductor technology (like microprocessors and memory) – we know how to build them big (or small!) and cheap– Faster, lower power than disk drives– ...but still more expensive, and has some limitations

• Two types of flash memory: NAND and NOR


NOR Flash

• NOR accessed like regular memory and has faster read time– Used for executables/firmware that dont need

to change often (PDAs, cellphones, etc.. code)

– Can be executed in place

• bad write/erase performance (2 seconds to erase a block!)

• bad wear properties (100,000 writes average lifetime)


NAND Flash

• Accessed like a block device (like a disk drive)– Higher density, lower cost

• Faster write/erase time; longer write life expectancy

• Well suited for cameras, mp3 players, USB drives...

• Less reliable than NOR (requires error correction codes)


Different properties from Disks

• Flash memory has quite different properties from disks – Emphasis on seek time gone

• Needs to erase a segment before writing (small writes are expensive!)– Slow...(especially NOR erase/write and NAND random

access reads)– Must be done in large segments (10s of KBytes)– Can only be rewritten a limited number of times


Summary of Flash circa. 2006

lecture 24 disk io and raid

Documents