mass storagecs510 computer architectureslecture 18 - 1 lecture 18 mass storage devices: raid, data...
TRANSCRIPT
Mass Storage CS510 Computer Architectures Lecture 18 - 1
Lecture 18Lecture 18
Mass Storage Devices:Mass Storage Devices:RAID, Data Library, MSSRAID, Data Library, MSS
Lecture 18Lecture 18
Mass Storage Devices:Mass Storage Devices:RAID, Data Library, MSSRAID, Data Library, MSS
Mass Storage CS510 Computer Architectures Lecture 18 - 2
Review: Improving Bandwidth Review: Improving Bandwidth of Secondary Storageof Secondary Storage
Review: Improving Bandwidth Review: Improving Bandwidth of Secondary Storageof Secondary Storage
• Processor performance growth phenomenal• I/O?
“I/O certainly has been lagging in the last decade”
Seymour Cray, Public Lecture (1976)
“Also, I/O needs a lot of work”
David Kuck, Keynote Address, (1988)
Mass Storage CS510 Computer Architectures Lecture 18 - 3
Network Attached StorageNetwork Attached StorageNetwork Attached StorageNetwork Attached Storage
High PerformanceStorage Serviceon a High Speed
Network
Decreasing Disk Diameters
14" » 10" » 8" » 5.25" » 3.5" » 2.5" » 1.8" » 1.3" » . . .high bandwidth disk systems based on arrays of disks
Increasing Network Bandwidth
3 Mb/s » 10Mb/s » 50 Mb/s » 100 Mb/s » 1 Gb/s » 10 Gb/snetworks capable of sustaining high bandwidth transfers
Network provideswell defined physicaland logical interfaces:separate CPU and storage system!
Network File ServicesOS structuressupporting remotefile access
Mass Storage CS510 Computer Architectures Lecture 18 - 4
RAIDRAID
Mass Storage CS510 Computer Architectures Lecture 18 - 5
Manufacturing Advantages Manufacturing Advantages of Disk Arrays of Disk Arrays
Manufacturing Advantages Manufacturing Advantages of Disk Arrays of Disk Arrays
Disk Product Families
3.5”
Disk Array: 1 disk design
14”10”5.25”3.5”
Conventional: 4 disk designs
Low End High End
Lecture 18 - 6CS510 Computer ArchitecturesMass Storage
Replace Small Number of Large Disks Replace Small Number of Large Disks
with Large Number of Small Diskswith Large Number of Small Disks
Replace Small Number of Large Disks Replace Small Number of Large Disks
with Large Number of Small Diskswith Large Number of Small Disks
IBM 3390 (K)
20 GBytes
97 cu. ft.
3 KW
15 MB/s
600 I/Os/s
250 KHrs
$250K
IBM 3.5" 0061
320 MBytes
0.1 cu. ft.
11 W
1.5 MB/s
55 I/Os/s
50 KHrs
$2K
x70
23 GBytes
11 cu. ft.
1 KW
120 MB/s
3900 IOs/s
??? Hrs
$150K
Disk Arrays have potential forlarge data and I/O rateshigh MB per cu. ft., high MB per KWawful reliability
Data Capacity
Volume
Power
Data Rate
I/O Rate
MTTF
Cost
Lecture 18 - 7CS510 Computer ArchitecturesMass Storage
Redundant Arrays of DisksRedundant Arrays of DisksRedundant Arrays of DisksRedundant Arrays of Disks
• Files are "striped" across multiple spindles to gain throughput• Increase of the number of disks reduces the reliability• Redundancy yields high data availability
Disks will failContents reconstructed from data redundantly stored in the array
– Capacity penalty to store it– Bandwidth penalty to update
Techniques:
Mirroring/Shadowing (high capacity cost)
Horizontal Hamming Codes (overkill)
Parity & Reed-Solomon Codes
Failure Prediction (no capacity overhead!)VaxSimPlus - Technique is controversial
Lecture 18 - 8CS510 Computer ArchitecturesMass Storage
Array ReliabilityArray ReliabilityArray ReliabilityArray Reliability
• Reliability of N disks = Reliability of 1 Disk N
50,000 Hours 70 disks = 700 hours
Disk system MTTF: Drops from 6 years to 1 month!
• Arrays without redundancy too unreliable to be useful!
Hot spares support reconstruction in parallel with access: very high media availability can be achieved
Lecture 18 - 9CS510 Computer ArchitecturesMass Storage
Redundant Arrays of Disks(RAID)Redundant Arrays of Disks(RAID)Redundant Arrays of Disks(RAID)Redundant Arrays of Disks(RAID)
• High I/O Rate Parity ArrayInterleaved parity blocksIndependent reads and writesLogical write = 2 reads + 2 writesParity + Reed-Solomon codes
• Disk Mirroring, Shadowing
Each disk is fully duplicated onto its "shadow"Logical write = two physical writes100% capacity overhead
10010011
Shadow10010011
• Parity Data Bandwidth ArrayParity computed horizontally
Recovery purpose instead of fault detectionLogically a single high data bw disk
Parity
10010011
10111001
10110011
01100110
Lecture 18 - 10CS510 Computer ArchitecturesMass Storage
Problems of Disk Arrays: Problems of Disk Arrays: Small WritesSmall Writes
Problems of Disk Arrays: Problems of Disk Arrays: Small WritesSmall Writes
RAID-5: Small Write Algorithm
1 Logical Write = 2 Physical Reads + 2 Physical Writes
D0 D1 D2 D3 PD0'
D1 D2 D3
+
2. Readold parity
XOR
+
1. Readold data
XOR
D0'
newdata
3. Writenew data
P'
4. Writenew parity
Lecture 18 - 11CS510 Computer ArchitecturesMass Storage
Redundant Arrays of Disks:Redundant Arrays of Disks: RAID 1: Disk Mirroring/ShadowingRAID 1: Disk Mirroring/Shadowing
Redundant Arrays of Disks:Redundant Arrays of Disks: RAID 1: Disk Mirroring/ShadowingRAID 1: Disk Mirroring/Shadowing
• Each disk is fully duplicated onto its "shadow" Very high availability can be achieved• Bandwidth sacrifice on write: Logical write = two physical writes• Reads may be optimized• Most expensive solution: 100% capacity overhead
Targeted for high I/O rate , high availability environments
recoverygroup
Lecture 18 - 12CS510 Computer ArchitecturesMass Storage
Redundant Arrays of Disks:Redundant Arrays of Disks: RAID 3: Parity DiskRAID 3: Parity Disk
Redundant Arrays of Disks:Redundant Arrays of Disks: RAID 3: Parity DiskRAID 3: Parity Disk
100100111100110110010011. . .
logical record
00110010
10010011
11001101
10010011
Striped physicalrecords
• Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time
• Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate disk
Targeted for high bandwidth applications: Scientific, Image Processing
P
11111111
Lecture 18 - 13CS510 Computer ArchitecturesMass Storage
Redundant Arrays of Disks:Redundant Arrays of Disks: RAID 5+: High I/O Rate ParityRAID 5+: High I/O Rate Parity
Redundant Arrays of Disks:Redundant Arrays of Disks: RAID 5+: High I/O Rate ParityRAID 5+: High I/O Rate Parity
Independent accesses occur in ‘llel
A logical write is 4 physical I/Os;2 Reads and 2 Writes
Independent writes,1 data and 1 parity,possible because of interleaved parity
Reed-Solomon Codes ("Q") for protection during reconstruction
Targeted for mixedapplications
StripeStripe
StripeUnit
StripeUnit
D0 D1 D2 D3 P0
D4 D5 D6 P1 D7
D8 D9 P2 D10 D11
D12 P3 D13 D14 D15
P4 D16 D17 D18 D19
D20 D21 D22 D23 P5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.Disk Columns
a disk
Lecture 18 - 14CS510 Computer ArchitecturesMass Storage
Subsystem OrganizationSubsystem OrganizationSubsystem OrganizationSubsystem Organization
hostarray
controller
single boarddisk
controller
single boarddisk
controller
single boarddisk
controller
single boarddisk
controller
hostadapter
physical devicecontrol
often piggy-backedin small format devices
• Striping software off-loaded from host to array controller
• No applications modifications• No reduction of host performance
control, buffering,parity logic
manages interfaceto host, DMA
Lecture 18 - 15CS510 Computer ArchitecturesMass Storage
System Availability:System Availability: Orthogonal RAIDsOrthogonal RAIDsSystem Availability:System Availability:
Orthogonal RAIDsOrthogonal RAIDs
Redundant Support Components: fans, power supplies, controller, cablesEnd to End Data Integrity: internal parity protected data paths
ArrayController
StringController
StringController
StringController
StringController
StringController
. . .
. . .
. . .
. . .
. . .
StringController . . .
Data Recovery Group: unit of data redundancy
Lecture 18 - 16CS510 Computer ArchitecturesMass Storage
System-Level AvailabilitySystem-Level AvailabilitySystem-Level AvailabilitySystem-Level Availability
RecoveryGroup
Goal: No SinglePoints ofFailure
with duplicated paths, higher performance can be obtained when there are no failures
Fully dual redundantI/O Controller I/O Controller
Array Controller Array Controller
. . .
. . .
. . .
. . . . . .
.
.
.
host
Mass Storage CS510 Computer Architectures Lecture 18 - 17
Magnetic TapesMagnetic TapesMagnetic TapesMagnetic Tapes
Mass Storage CS510 Computer Architectures Lecture 18 - 18
Memory HierarchiesMemory HierarchiesMemory HierarchiesMemory Hierarchies
General Purpose Computing Environment
Memory Hierarchy
AccessTime
Capacity
Cost perbit
FileCache
Hard Disk
Tapes
Mass Storage CS510 Computer Architectures Lecture 18 - 19
Memory HierarchiesMemory HierarchiesMemory HierarchiesMemory Hierarchies
FileCache
Hard Disk
Tapes
General PurposeComputing Environment
Memory Hierarchy1980
Off Line Storage
On-Line
Near-Line
Disk Arrays
Memory Hierarchy1995
FileCache
SSD
High I/O RateDisks
High Data RateDisks
OpticalJukeBox
AutomatedTape
Libraries
Low $/Actuator
Low $/MB
Remote Archive
Mass Storage CS510 Computer Architectures Lecture 18 - 20
Storage Trends:Storage Trends: Distributed StorageDistributed Storage
Storage Trends:Storage Trends: Distributed StorageDistributed Storage
Storage Hierarchy1980
Storage Hierarchy1990
Local Magnetic Disk
File Cache
Magnetic Tape
Server “Remote” Magnetic Disk
Local AreaNetwork
Server Cache
ClientWorkstation
FileServer
Declining $/MByte
IncreasingAccess Time
Capacity
File Cache
Magnetic Disk
Magnetic Tape
Mass Storage CS510 Computer Architectures Lecture 18 - 21
Storage Trends:Storage Trends: Wide-Area StorageWide-Area Storage
Storage Trends:Storage Trends: Wide-Area StorageWide-Area Storage
1995 Typical Storage Hierarchy
Conventional disks replaced by disk arrays
Near-line storage emerges between disk and tape
Local AreaNetwork
Disk Array
Server Cache
Shelved Magnetic or Optical Tape
Optical Disk Jukebox
Magnetic or Optical Tape Library
ClientCache
On-line Storage
Near-line Storage
Off-line Storage
Wide AreaNetwork
Internet
Mass Storage CS510 Computer Architectures Lecture 18 - 22
What's All This About Tape?What's All This About Tape?What's All This About Tape?What's All This About Tape?
Tape is used for:
• Backup Storage for Hard Disk Data – Written once, very infrequently (hopefully never!) read
• Software Distribution– Written once, read once
• Data Interchange– Written once, read once
• File Retrieval– Written/Rewritten, files occasionally read– Near Line Archive– Electronic Image Management
Relatively New Application For Tape
Mass Storage CS510 Computer Architectures Lecture 18 - 23
Alternative Data Storage Alternative Data Storage TechnologiesTechnologies
Alternative Data Storage Alternative Data Storage TechnologiesTechnologies
* Second Generation 8mm: 5000 MB, 500KB/s** Second Generation 4mm: 10000 GB
Conventional Tape:Reel-to-Reel (.5") 140 6250 18 0.11 549 minutesCartridge (.25") 150 12000 104 1.25 92 minutes
Cap BPI TPI BPI*TPI Data Xfer Acc TimeTechnology (MB) (Million) (KByte/s)
Helical Scan Tape:VHS (.5") 2500 1743 5 650 11.33 120 minutesVideo (8mm)* 2300 43200 819 35.28 246 minutesDAT (4mm)** 1300 61000 1870 114.07 183 20 seconds
Disk:Hard Disk (5.25") 760 30552 1667 50.94 1373 20 msFloppy Disk (3.5") 2 17434 135 2.35 92 1 secondCD ROM (3.5") 540 27600 15875 438.15 183 1 second
Mass Storage CS510 Computer Architectures Lecture 18 - 24
R-DAT TechnologyR-DAT TechnologyR-DAT TechnologyR-DAT Technology
Two Competing Standards
DDS (HP, Sony)
* 22 frames/group * 1870 tpi * Optimized for serial writes
DataDAT (Hitachi, Matsushita, Sharp)
* Two modes: streaming (like DDS) and update in place * Update in place sacrifices transfer rate and capacity
Spare data groups, inter-group gaps, preformatted tapes
Mass Storage CS510 Computer Architectures Lecture 18 - 25
R-DAT TechnologyR-DAT TechnologyR-DAT TechnologyR-DAT Technology
Advantages:
* Small Formfactor, easy handling/loading * 200X speed search on index fields (40 sec. max, 20 sec. avg.) * 1000X physical positioning (8 sec. max, 4 sec. avg.) * Inexpensive media ($10/GBytes) * Volumetric Efficiency: 1 GB in 2.5 cu. in; 1 TB in 1 cu. ft.
Disadvantages:
* Two incompatible standards (DDS, DataDAT) * Slow XFER rate * Lower capacity vs. 8mm tape * Small bit size (13 x 0.4 sq. micron) effect on archive stability
Mass Storage CS510 Computer Architectures Lecture 18 - 26
R-DAT Technical ChallengesR-DAT Technical ChallengesR-DAT Technical ChallengesR-DAT Technical Challenges
Tape Capacity* Data Compression is key
Tape Bandwidth
* Data Compression* Striped Tape
Lecture 18 - 27CS510 Computer ArchitecturesMass Storage
MSS Tape: MSS Tape: No Perfect? Tape DriveNo Perfect? Tape Drive
MSS Tape: MSS Tape: No Perfect? Tape DriveNo Perfect? Tape Drive
• Best 2 out of 3 Cost, Size, Speed
• Expensive (Fast & big)
• Cheap (Slow & big)
Cost
Capacity
Speed
Mass Storage CS510 Computer Architectures Lecture 18 - 28
Data Compression IssuesData Compression IssuesData Compression IssuesData Compression IssuesPeripheral Manufacturer Approach:
Host SCSIHBA
EmbeddedController
Transport
HostSCSI HBA
Video Compression
Audio Compression
Image Compression
Text Compression
. . .
EmbeddedController
Transport
System Approach:Compression Done Here
20:12,3:1
Data SpecificCompression
Hints fromHost
Mass Storage CS510 Computer Architectures Lecture 18 - 29
Striped TapeStriped TapeStriped TapeStriped Tape
Speed MatchingBuffers
EmbeddedController Transport
EmbeddedController Transport
EmbeddedController Transport
EmbeddedController Transport
To/FromHost
180 KB/s
180 KB/s
180 KB/s
180 KB/s
Challenges: * Difficult to logically synchronize tape drives * Unpredictable write times R after W verify, Error Correction Schemes, N Group Writing, Etc.
Mass Storage CS510 Computer Architectures Lecture 18 - 30
Automated Media HandlingAutomated Media HandlingAutomated Media HandlingAutomated Media Handling
TapeCarousels
Gravity Feed
3.5" formfactortape reader
19"
Carousel
4mmTape
Reader
Mass Storage CS510 Computer Architectures Lecture 18 - 31
Automated Media HandlingAutomated Media HandlingAutomated Media HandlingAutomated Media Handling
Front ViewSide View
TapeReaders
Tape Cassette
Tape Pack: Unit of Archive
Lecture 18 - 32CS510 Computer ArchitecturesMass Storage
MSS: Automated Tape LibraryMSS: Automated Tape LibraryMSS: Automated Tape LibraryMSS: Automated Tape Library
• 116 x 5 GB 8 mm tapes = 0.6 TBytes (1991)
• 4 tape readers 1991, 8 half height readers now
• 4 x .5 MByte/second = 2 MBytes/s
• $40,000 O.E.M. Price• Predict 1995: 3 TBytes;
2000: 9 TBytes
EXB-120Cartridge Holders
Tape Readers
3 feet
5 feet
Entry/Exit Port
Mass Storage CS510 Computer Architectures Lecture 18 - 33
Open Research IssuesOpen Research IssuesOpen Research IssuesOpen Research Issues
• Hardware/Software attack on very large storage systems» File system extensions to handle terabyte sized file systems» Storage controllers able to meet bandwidth and capacity demands
• Compression/decompression between secondary and tertiary storage
» Hardware assist for on-the-fly compression» Application hints for data specific compression» More effective compression over large buffered data» DB indices over compressed data
• Striped tape: is large buffer enough?
* Applications: Where are the Terabytes going to come from?
» Image Storage Systems» Personal Communications Network multimedia file server
Mass Storage CS510 Computer Architectures Lecture 18 - 34
MSS:MSS: Applications of Technology Robo-Line LibraryApplications of Technology Robo-Line Library
MSS:MSS: Applications of Technology Robo-Line LibraryApplications of Technology Robo-Line Library
Books/Bancroft x Pages/book x bytes/page = Bancroft 372,910 400 4000 = 0.54 TB
Full text Bancroft Near Line = 0.5 TB;
Pages images ? 20 TB
Predict: "RLB" (Robo-Line Bancroft) = $250,000
Bancroft costs: Catalogue a book: $20 / book Reshelve a book: $1/ book % new books purchased per year never checked out: 20%
Mass Storage CS510 Computer Architectures Lecture 18 - 35
MSS: SummaryMSS: SummaryMSS: SummaryMSS: SummaryA
cces
s T
ime
(ms)
0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
$0.00
$0.01
$0.10
$1.00
$10.00
$100.00Robo-Line Tape
Magnetic Disk
DRAM
Access Gap #1
Access Gap #2
$ / MB