lecture 24 disk io and raid
DESCRIPTION
CS 15-447: Computer Architecture. Lecture 24 Disk IO and RAID. November 12, 2007 Nael Abu-Ghazaleh [email protected] http://www.qatar.cmu.edu/~msakr/15447-f08. Interfacing Processor with peripherals. Processor. L1 cache Instrs. L1 cache data. L2 Cache. Front side bus, aka system bus. - PowerPoint PPT PresentationTRANSCRIPT
15-447 Computer Architecture Fall 2008 ©
November 12, 2007Nael Abu-Ghazaleh
[email protected]://www.qatar.cmu.edu/~msakr/15447-f08
Lecture 24Disk IO and RAID
CS 15-447: Computer Architecture
15-447 Computer Architecture Fall 2008 ©
Interfacing Processor with peripherals
mainmemory
I/O bridge
bus interface
Front side bus, akasystem bus memory bus
L2 Cache
L1 cachedata
L1 cacheInstrs.
To I/O
Processor
15-447 Computer Architecture Fall 2008 ©
Another view
15-447 Computer Architecture Fall 2008 ©
Disk Access
• Seek: position head over the proper track(5 to 15 ms. avg.)
• Rotate: wait for desired sector(.5 / RPM). RPM 5400—15,000 currently
• Transfer: get the data(30-100Mbytes/sec)
Platter
Track
Platters
Sectors
Tracks
15-447 Computer Architecture Fall 2008 ©
Manufacturing Advantages of Disk Arrays
14”10”5.25”3.5”
3.5”
Disk Array: 1 disk design
Conventional: 4 disk designs
Low End High End
Disk Product Families
15-447 Computer Architecture Fall 2008 ©
RAID: Redundant Array of Inexpensive Disks
• RAID 0: Striping (misnomer: non-redundant)• RAID 1: Mirroring• RAID 2: Striping + Error Correction• RAID 3: Bit striping + Parity Disk• RAID 4: Block striping + Parity Disk• RAID 5: Block striping + Distributed Parity• RAID 6: multiple parity checks
15-447 Computer Architecture Fall 2008 ©
Non-Redundant Array
• Striped: write sequential blocks across disk array
• High performance• Poor reliability:
MTTFArray = MTTFDisk / NMTTFDisk = 50,000 hours (6 years)N = 70 DisksMTTFArray= 700 hours (1 month)
OddBlocks
EvenBlocks
15-447 Computer Architecture Fall 2008 ©
Redundant Arrays of Disks
• Files are "striped" across multiple spindles
• Redundancy yields high data availability• When disks fail, contents are
reconstructed from data redundantly stored in the array
• High reliability comes at a cost:– Reduced storage capacity– Lower performance
15-447 Computer Architecture Fall 2008 ©
RAID 1: Mirroring
• Each disk is fully duplicated onto its “shadow” very high availability
• Bandwidth sacrifice on writes:Logical write = two physical writes
• Reads may be optimized• Most expensive solution: 100%
capacity overhead
Used in high I/O rate , high availability environments
15-447 Computer Architecture Fall 2008 ©
RAID 3: bit striping + parity
• A parity bit for every bit in the striped data
• Parity is relatively easy to compute
• How does it perform for small reads/writes?
15-447 Computer Architecture Fall 2008 ©
Redundant Arrays of Disks RAID 3: Parity Disk
P100100111100110110010011
. . .
logical record 10010011
11001101
10010011
00110000
Striped physicalrecords
• Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time• Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate disk
Targeted for high bandwidth applications: Scientific, Image Processing
15-447 Computer Architecture Fall 2008 ©
RAID 4 (Block interleaved parity)
15-447 Computer Architecture Fall 2008 ©
Redundant Arrays of Disks RAID 5+: High I/O Rate Parity
A logical writebecomes fourphysical I/Os
Independent writespossible because ofinterleaved parity
Reed-SolomonCodes ("Q") forprotection duringreconstruction
A logical writebecomes fourphysical I/Os
Independent writespossible because ofinterleaved parity
Reed-SolomonCodes ("Q") forprotection duringreconstruction
D0 D1 D2 D3 P
D4 D5 D6 P D7
D8 D9 P D10 D11
D12 P D13 D14 D15
P D16 D17 D18 D19
D20 D21 D22 D23 P
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.Disk Columns
IncreasingLogical
Disk Addresses
Stripe
StripeUnit
Targeted for mixedapplications
15-447 Computer Architecture Fall 2008 ©
Nested RAID levels
• RAID 01 and 10 combine mirroring and striping– Combine high performance (striping) and
reliability (mirroring)– Get reliability without having to compute
parities: higher performance and less complex controller
• RAID 05 and 50 (also called 53)
15-447 Computer Architecture Fall 2008 ©
Operating System can help (1) Reducing access time
• Disk defragmentation: why does that work?
• Disk scheduling: operating system can reorder requests– How does it work? Reduce seek time
• Example: Mean seek distance first, Elevator algorithm, Typewriter algorithm– Lets do an example
• Log structured file systems
15-447 Computer Architecture Fall 2008 ©
Log structured file systems
• Idea: most reads to disk are serviced from cache – locality!
• But what about writes? they have to go to disk; if system crashes, we the file system is compromised
• How can we make updates perform better:– Save them in a log (sequentially) instead of
their original location; why does that help?– Tricky to manage
15-447 Computer Architecture Fall 2008 ©
Operating System can help (2) Reliability
• RAIDs are reliable to disk failures, not CPU failures/software bugs– If the cpu writes corrupt data to all redundant
disks, what can we do?
• Backups• Reliability in the operating system
15-447 Computer Architecture Fall 2008 ©
How are files allocated on disk?
• Index block, has pointers to the other blocks in the file
• Alternatives: linked allocation
• Data and meta data both stored on disk
• What do we do for bigger files?
15-447 Computer Architecture Fall 2008 ©
Unix Inodes
15-447 Computer Architecture Fall 2008 ©
Disk reliability
• Any update to disk, changes both data and meta data– requires several writes
• Operating system may reorder them as we saw
• What happens if there is a crash?– Lets look at examples
• Solution: journaling file system– Update journal before updating filesystem
15-447 Computer Architecture Fall 2008 ©
Flash Memory
• Emerging technology for non-volatile storage – competitor to hard disks, especially for embedded market– Can be used as cache for the disk (much larger than RAM
disks for the same price, and persistent)
• Floating gate transistors: semi-conductor technology (like microprocessors and memory) – we know how to build them big (or small!) and cheap– Faster, lower power than disk drives– ...but still more expensive, and has some limitations
• Two types of flash memory: NAND and NOR
15-447 Computer Architecture Fall 2008 ©
NOR Flash
• NOR accessed like regular memory and has faster read time– Used for executables/firmware that dont need
to change often (PDAs, cellphones, etc.. code)
– Can be executed in place
• bad write/erase performance (2 seconds to erase a block!)
• bad wear properties (100,000 writes average lifetime)
15-447 Computer Architecture Fall 2008 ©
NAND Flash
• Accessed like a block device (like a disk drive)– Higher density, lower cost
• Faster write/erase time; longer write life expectancy
• Well suited for cameras, mp3 players, USB drives...
• Less reliable than NOR (requires error correction codes)
15-447 Computer Architecture Fall 2008 ©
Different properties from Disks
• Flash memory has quite different properties from disks – Emphasis on seek time gone
• Needs to erase a segment before writing (small writes are expensive!)– Slow...(especially NOR erase/write and NAND random
access reads)– Must be done in large segments (10s of KBytes)– Can only be rewritten a limited number of times
15-447 Computer Architecture Fall 2008 ©
Summary of Flash circa. 2006