raid-x: a new distributed disk array for i/o-centric cluster computing

24
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho

Upload: kirsi

Post on 02-Feb-2016

80 views

Category:

Documents


0 download

DESCRIPTION

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing. Kai Hwang, Hai Jin, and Roy Ho. Outline. Introduction RAID Orthogonal Striping and Mirroring Trojans Cluster Experiments Cooperative disk drivers Benchmark Experiments Striped Checkpointing on RAID-x. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

Kai Hwang, Hai Jin, and Roy Ho

Page 2: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

2

Outline

Introduction

RAID

Orthogonal Striping and Mirroring

Trojans Cluster Experiments

Cooperative disk drivers

Benchmark Experiments

Striped Checkpointing on RAID-x

Page 3: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

3

Introduction

RAID-X Redundant array of inexpensive disks at level x Provides

High Bandwidth distributed I/O processing on a Serverless Cluster where server

functions are distributed among client hosts Based on Orthogonal striping and mirroring (OSM) Cooperative Disk Drivers (CDD) are used to

implement the OSM at the kernel level Maintains data consistency without using NFS or Unix system

calls

Page 4: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

4

Distributed RAID-x

Must have these Capabilities:1. A single I/O space (SIOS) for all disks in cluster.

2. High scalability, availability and compatibility with current cluster architectures and applications.

3. Local and remote disk I/O operations performed with comparable latency.

Implies. Total transparency to users. Utilize all disks without knowing the physical locations of

the data blocks.

Page 5: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

5

Orthogonal Striping and Mirroring

OSM Provides: Improvement in Parallel I/O Bandwidth Hides disk mirroring overhead Enhances scalability and reliability of cluster

computing applications.

Eliminates Small write problem which affects RAID-5.

Has advantages of both RAID-1 and Chained Declustering

Page 6: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

6

Advantages of RAID-1 & Chained Declustering

RAID-1: Mirroring and Duplexing100% redundancy of data = no rebuild, just

a copyTwice the Read transaction rate of single

disks

Chained Declustering Load Balancing

Page 7: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

7

Architecture of RAID-x vs. chained declustering RAID

Bj original data blocks.

Mj mirrored blocks.4 disks with mirroring groups involving 3 consecutive disk blocks.

ex. M0, M1, M2, mirroring blocks for B0, B1, B2 data blocks.

Diff mirroring groups are in diff shadings.

Page 8: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

8

Architecture of RAID-x

Orthogonal mapping. no data block and it’s image are mapped to the same disk.

Data blocks are striped across all disks on the top half of the disk array, like RAID-0.

Means that for large writes blocks can be written in parallel to all disks in the stripe simultaneously.

Image blocks are “clustered” in the same disk vertically. Clustered images in the mirroring group are

simultaneously updated in the background. Resulting in lower latency and higher bandwidth in RAID-x.

Page 9: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

10

Performance of 4 RAID Architectures

Write operations are improved.

The same bandwidth potential as Raid-0 and chained declustering.

Improvements from declustering are mainly in parallel writes. For large array size improvement approaches a factor of 2.

Tolerate single disk failures (RAID-5).

Page 10: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

11

Trojans Cluster

16 Pentium II/400 MHz processors

RedHad Linux v. 6.0

PC engines (nodes) were connected by 100 Mbps Fast Ethernet

Each node is attached with a 10GB disk 16 disks = 160 GB single

I/O space

Page 11: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

12

Distributed RAID-x Architecture

3 disks for each node, 4 nodes

Stripe groups B0, B1, B2, B3

accessed in parallelConsecutive stripe groups (B0, B1, B2, B3), (B4, B5, B6, B7), (B8, B9, B10, B11) accessed in pipeline fashion because they are retrieved from disk groups attached to the same SCSI buses

4x3 RAID-x architecture with orthogonal striping and mirroring

P: processor, M:memoryCCD:cooperative disk driver, Dj the jth disk, Bi: the ith data block, Bi’: the ith mirrored image in a shaded box

Page 12: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

13

Distributed RAID-x Arch. cont.

n-by-k RISK-x Stripe group w/in disk

blocks on n disks Mirroring group n-1 blocks

on 1 disk Images of all data blocks in

stripe group are saved to two disks

Block addressing scheme strips across all nk disks sequentially and repeatedly

n = degree of parallelism K = depth of pipelining

4x3 RAID-x architecture with orthogonal striping and mirroring

P: processor, M:memoryCCD:cooperative disk driver, Dj the jth disk, Bi: the ith data block, Bi’: the ith mirrored image in a shaded box

Page 13: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

14

Single I/O space in a Distributed RAID

A global virtual disk with a SIOS formed by cooperative disks

crucial to building scaleable cluster of computersIf not shared then I/O must be handled by time consuming system class through centralized file server (ex NSF)Enabled by CDDs at Linux kernel level

Page 14: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

15

Cooperative Disk Driver (CDD) Architecture

Internal design of CDD arch.

Establish SIOS single global virtual disk

Storage Manager Receives and processes the

I/O requests

Client modules Redirects local I/O requests

to remote disk managers

Page 15: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

16

CDD Architecture cont.

Data Consistency Module Maintains data consistency at Driver level that result from

distributed disks updating cached copies of same data block

Can Run in 3 different states Storage Manager: coordinates use of local disk storage by

remote nodes Client: accessing remote disks through remote disk

managers both

Page 16: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

17

CDD

Allows serverless clusters

Offers remote disk access directly at kernel level

Page 17: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

18

I/O Bandwidth vs. request number

Performance of 4 I/O subsystem architectures

For large read/write20MB file striped across all

disks in arrayFocuses on parallel I/O

capacity of disk arrayUncached filesClient reads only private fileAll reads performed

simultaneously

Page 18: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

19

I/O Bandwidth vs. request number

Performance of 4 I/O subsystem architectures

Small read/write32KB data One block of stripe group

Results for small are very close to the results for large

Page 19: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

21

Achievable I/O Bandwidth and Improvement Factor

Improvement factor of 16 clients over 1 on USC Trojans ClusterRAID-x demonstrates the highest improvement factor among the three RAID Arch.Almost 3x increase on RAID-X from 1-16

Page 20: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

22

Elapsed Time in Executing the Andrew Benchmark

On 4 I/O subsystems with respect to increase of number of client requests up to 32.

NSF results RAID-x results

Page 21: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

23

Striped Checkpointing on the RAID-x

Distribute data blocks and their mirrored images orthogonallyStriped staggering in coordinated checkpointing on the RAID-x disk array Successive stripes are accessed

in a staggered manner from diff. Stripes on successive 4-disk groups.

Staggering implies pipelined access of disk array

Page 22: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

24

Striped Checkpointing on the RAID-x

Using OSM each striped checkpointing file has its mirrored image on its local disk. For each node, transient

failures can be recovered from its mirrored image in local disk. Permanent failures can be recovered from striped checkpointing.

Page 23: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

25

Conclusions

RAID-x shows strength in building distributed, high-bandwidth, I/O storage for serverless PC or workstation clusters.OSM Architecture exploits full stripe bandwidthReliability Clustered mirror on on local disks and orthogonal

striping across distributed disks. Matches RAID-5 (recovery from single disk failures)

I/O performance is better than RAID-1 and 5Highly scalable with distributed control

Page 24: RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

26

Questions

What does Orthogonal striping and mirroring (OSM) provide?

What is RAID-x?

Explain the Cooperative Disk Driver (CDD).

Compare RAID-x to Petal.