ffs, lfs, and raid

66
FFS, LFS, and RAID Andy Wang COP 5611 Advanced Operating Systems

Upload: marny-cook

Post on 30-Dec-2015

60 views

Category:

Documents


4 download

DESCRIPTION

FFS, LFS, and RAID. Andy Wang COP 5611 Advanced Operating Systems. UNIX Fast File System. Designed to improve performance of UNIX file I/O Two major areas of performance improvement Bigger block sizes Better on-disk layout for files. Block Size Improvement. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: FFS, LFS, and RAID

FFS, LFS, and RAID

Andy WangCOP 5611

Advanced Operating Systems

Page 2: FFS, LFS, and RAID

UNIX Fast File System

Designed to improve performance of UNIX file I/O

Two major areas of performance improvementBigger block sizesBetter on-disk layout for files

Page 3: FFS, LFS, and RAID

Block Size Improvement

4x block size quadrupled amount of data gotten per disk fetch

But could lead to fragmentation problemsSo fragments introduced

Small files stored in fragmentsFragments addressable

But not independently fetchable

Page 4: FFS, LFS, and RAID

Disk Layout Improvements

Aimed toward avoiding disk seeksBad if finding related files takes many

seeksVery bad if find all the blocks of a single

file requires seeksSpatial locality: keep related things close

together on disk

Page 5: FFS, LFS, and RAID

Cylinder Groups

A cylinder group: a set of consecutive disk cylinders in the FFS

Files in the same directory stored in the same cylinder group

Within a cylinder group, tries to keep things contiguous

But must not let a cylinder group fill up

Page 6: FFS, LFS, and RAID

Locations for New Directories

Put new directory in relatively empty cylinder group

What is “empty”?Many free i_nodesFew directories already there

Page 7: FFS, LFS, and RAID

The Importance of Free Space

FFS must not run too close to capacityNo room for new filesLayout policies ineffective when too few

free blocksTypically, FFS needs 10% of the total

blocks free to perform well

Page 8: FFS, LFS, and RAID

Performance of FFS

4x to 15x the bandwidth of old UNIX file system

Depending on size of disk blocksPerformance on original file system

Limited by CPU speedDue to memory-to-memory buffer copies

Page 9: FFS, LFS, and RAID

FFS Not the Ultimate Solution

Based on technology of the early 80sAnd file usage patterns of those timesIn modern systems, FFS achieves only

~5% of raw disk bandwidth

Page 10: FFS, LFS, and RAID

The Log-Structured File System

Large caches can catch almost all readsBut most writes have to go to diskSo FS performance can be limited by

writesSo, produce a FS that writes quicklyLike an append-only log

Page 11: FFS, LFS, and RAID

Basic LFS Architecture

Buffer writes, send them sequentially to diskData blocksAttributesDirectoriesAnd almost everything else

Converts small sync writes to large async writes

Page 12: FFS, LFS, and RAID

A Simple Log Disk Structure

File A

Block7

File Z

Block1

File M

Block202

File A

Block3

File F

Block1

File A

Block7

File L

Block26

File L

Block25

Head of Log

Page 13: FFS, LFS, and RAID

Key Issues in Log-Based Architecture

1. Retrieving information from the log

No matter how well you cache, sooner or later you have to read

2. Managing free space on the disk

You need contiguous space to write - in the long run, how do you get more?

Page 14: FFS, LFS, and RAID

Finding Data in the Log

Give me block 25 of file LOr,Give me block 1 of file F

File A

Block7

File Z

Block1

File M

Block202

File A

Block3

File F

Block1

File A

Block7

File L

Block26

File L

Block25

Page 15: FFS, LFS, and RAID

Retrieving Information From the Log

Must avoid sequential scans of disk to read files

Solution: store index structures in logIndex is essentially the most recent

version of the i_node

Page 16: FFS, LFS, and RAID

Finding Data in the Log

How do you find all blocks of file Foo?

FooBlock 1

FooBlock2

FooBlock3

FooBlock1(old)

Page 17: FFS, LFS, and RAID

Finding Data in the Log with an I_node

FooBlock 1

FooBlock2

FooBlock3

FooBlock1(old)

Page 18: FFS, LFS, and RAID

How Do You Find a File’s I_node?

You could search sequentiallyLFS optimizes by writing i_node maps to

the logThe i_node map points to the most recent

version of each i_nodeA file system’s i_nodes cover multiple

blocks of i_node map

Page 19: FFS, LFS, and RAID

How Do You Find the Inode?

The Inode Map

Page 20: FFS, LFS, and RAID

How Do You Find Inode Maps?

Use a fixed region on disk that always points to the most recent i_node map blocks

But cache i_node maps in main memorySmall enough that few disk accesses

required to find i_node maps

Page 21: FFS, LFS, and RAID

Finding I_node Maps

New i_node mapsAn old i_node map

Page 22: FFS, LFS, and RAID

Reclaiming Space in the Log

Eventually, the log reaches the end of the disk partition

So LFS must reuse disk space like superseded data blocks

Space can be reclaimed in background or when needed

Goal is to maintain large free extents on disk

Page 23: FFS, LFS, and RAID

Example of Need for Reuse

Head of log

New data to be logged

Page 24: FFS, LFS, and RAID

Major Alternatives for Reusing Log

Threading+ Fast

- Fragmentation

- Slower reads

Head of log

New data to be logged

Page 25: FFS, LFS, and RAID

Major Alternatives for Reusing Log

Copying+Simple

+Avoids fragmentation

-Expensive

New data to be logged

Page 26: FFS, LFS, and RAID

LFS Space Reclamation Strategy

Combination of copying and threadingCopy to free large fixed-size segmentsThread free segments togetherTry to collect long-lived data permanently

into segments

Page 27: FFS, LFS, and RAID

A Threaded, Segmented Log

Head of log

Page 28: FFS, LFS, and RAID

Cleaning a Segment

1. Read several segments into memory

2. Identify the live blocks

3. Write live data back (hopefully) into a smaller number of segments

Page 29: FFS, LFS, and RAID

Identifying Live Blocks

Hard to track down live blocks of all filesInstead, each segment maintains a

segment summary blockIdentifying what is in each block

Crosscheck blocks with owning i_node’s block pointers

Written at end of log write, for low overhead

Page 30: FFS, LFS, and RAID

Segment Cleaning Policies

What are some important questions?When do you clean segments?How many segments to clean?Which segments to clean?How to group blocks in their new segments?

Page 31: FFS, LFS, and RAID

When to Clean

PeriodicallyContinuouslyDuring off-hoursWhen disk is nearly fullOn-demandLFS uses a threshold system

Page 32: FFS, LFS, and RAID

How Many Segments to Clean

The more cleaned at once, the better the reorganization of the diskBut the higher the cost of cleaning

LFS cleans a few tens at a timeTill disk drops below threshold value

Empirically, LFS not very sensitive to this factor

Page 33: FFS, LFS, and RAID

Which Segments to Clean?

Cleaning segments with lots of dead data gives great benefit

Some segments are hot, some segments are cold

But “cold” free space is more valuable than “hot” free space

Since cold blocks tend to stay cold

Page 34: FFS, LFS, and RAID

Cost-Benefit Analysis

u = utilizationA = ageBenefit to cost = u*A/(u + 1)Clean cold segments with some space,

hot segments with a lot of space

Page 35: FFS, LFS, and RAID

What to Put Where?

Given a set of live blocks and some cleaned segments, which goes where?Order blocks by ageWrite them to segments oldest first

Goal is very cold, highly utilized segments

Page 36: FFS, LFS, and RAID

Goal of LFS Cleaning

100% fullempty

number of segments

100% fullempty

number of segments

Page 37: FFS, LFS, and RAID

Performance of LFS

On modified Andrew benchmark, 20% faster than FFS

LFS can create and delete 8 times as many files per second as FFS

LFS can read 1 ½ times as many small files

LFS slower than FFS at sequential reads of randomly written files

Page 38: FFS, LFS, and RAID

Logical Locality vs. Temporal Locality

Logical locality (spatial locality): Normal file systems keep a file’s data blocks close together

Temporal locality: LFS keeps data written at the same time close together

When temporal locality = logical localitySystems perform the same

Page 39: FFS, LFS, and RAID

Major Innovations of LFS

Abstraction: everything is a logTemporal locality Use of caching to shape disk access

patterns Cache most readsOptimized writes

Separating full and empty segments

Page 40: FFS, LFS, and RAID

Where Did LFS Look For Performance Improvements?Minimized disk access

Only write when segments filled up

Increased size of data transfersWrite whole segments at a time

Improving localityAssuming temporal locality, a file’s blocks are

all adjacent on diskAnd temporally related files are nearby

Page 41: FFS, LFS, and RAID

Parallel Disk Access and RAID

One disk can only deliver data at its maximum rate

So to get more data faster, get it from multiple disks simultaneously

Saving on rotational latency and seek time

Page 42: FFS, LFS, and RAID

Utilizing Disk Access Parallelism

Some parallelism available just from having several disks

But not muchInstead of satisfying each access from

one disk, use multiple disks for each access

Store part of each data block on several disks

Page 43: FFS, LFS, and RAID

Disk Parallelism Example

open(foo) read(bar) write(zoo)

FileSystem

Page 44: FFS, LFS, and RAID

Data Striping

Transparently distributing data over disksBenefits –

Increases disk parallelismFaster response for big requests

Major parameters Number of disks Size of data interleaf

Page 45: FFS, LFS, and RAID

Fine vs. Coarse grained Data Interleaving Fine grained data interleaving

+ High data rate for all requestsBut only one request per disk arrayLots of time spent positioning

Coarse grained data interleaving+ Large requests access many disks

+ Many small requests handled at onceSmall I/O requests access few disks

Page 46: FFS, LFS, and RAID

Reliability of Disk Arrays

Without disk arrays, failure of one disk among N loses 1/Nth of the data

With disk arrays (fine grained across all N disks), failure of one disk loses all data

N disks 1/Nth as reliable as one disk

Page 47: FFS, LFS, and RAID

Adding Reliability to Disk Arrays

Buy more reliable disksBuild redundancy into the disk array

Multiple levels of disk array redundancy possible

Most organizations can prevent any data loss from single disk failure

Page 48: FFS, LFS, and RAID

Basic Reliability Mechanisms

Duplicate dataParity for error detectionError Correcting Code for detection and

correction

Page 49: FFS, LFS, and RAID

Parity Methods

Can use parity to detect multiple errorsBut typically used to detect single error

If hardware errors are self-identifying, parity can also correct errors

When data is written, parity must be written, too

Page 50: FFS, LFS, and RAID

Error-Correcting Code

Based on Hamming codes, mostlyNot only detect error, but identify which bit

is wrong

Page 51: FFS, LFS, and RAID

RAID Architectures

Redundant Arrays of Independent DisksBasic architectures for organizing disks

into arraysAssuming independent control of each

diskStandard classification scheme divides

architectures into levels

Page 52: FFS, LFS, and RAID

Non-Redundant Disk Arrays (RAID Level 0)

No redundancy at allSo, what we just talked aboutAny failure causes data loss

Page 53: FFS, LFS, and RAID

Non-Redundant Disk Array Diagram (RAID Level 0)

open(foo) read(bar) write(zoo)

FileSystem

Page 54: FFS, LFS, and RAID

Mirrored Disks (RAID Level 1)

Each disk has second disk that mirrors its contentsWrites go to both disksNo data striping

+ Reliability is doubled

+ Read access faster

- Write access slower

- Expensive and inefficient

Page 55: FFS, LFS, and RAID

Mirrored Disk Diagram (RAID Level 1)

open(foo) read(bar) write(zoo)

FileSystem

Page 56: FFS, LFS, and RAID

Memory-Style ECC (RAID Level 2)

Some disks in array are used to hold ECCE.g., 4 data disks require 3 ECC disks

+ More efficient than mirroring

+ Can correct, not just detect, errors

- Still fairly inefficient

Page 57: FFS, LFS, and RAID

Memory-Style ECC Diagram (RAID Level 2)

open(foo) read(bar) write(zoo)

FileSystem

Page 58: FFS, LFS, and RAID

Bit-Interleaved Parity (RAID Level 3)

Each disk stores one bit of each data block

One disk in array stores parity for other disks

+ More efficient that Levels 1 and 2

- Parity disk doesn’t add bandwidth

Page 59: FFS, LFS, and RAID

Bit-Interleaved RAID Diagram (Level 3)

open(foo) read(bar) write(zoo)

FileSystem

Page 60: FFS, LFS, and RAID

Block-Interleaved Parity (RAID Level 4)

Like bit-interleaved, but data is interleaved in blocks of arbitrary sizeSize is called striping unitSmall read requests use 1 disk

+ More efficient data access than level 3

+ Satisfies many small requests at once

- Parity disk can be a bottleneck

- Small writes require 4 I/Os

Page 61: FFS, LFS, and RAID

Block-Interleaved Parity Diagram (RAID Level 4)

open(foo) read(bar) write(zoo)

FileSystem

Page 62: FFS, LFS, and RAID

Block-Interleaved Distributed-Parity (RAID Level 5)

Spread the parity out over all disks

+ No parity disk bottleneck

+ All disks contribute read bandwidth

– Requires 4 I/Os for small writes

Page 63: FFS, LFS, and RAID

Block-Interleaved Distributed-Parity Diagram (RAID Level 5)

open(foo) read(bar) write(zoo)

FileSystem

Page 64: FFS, LFS, and RAID

Other RAID Configurations

RAID 6Can survive two disk failures

RAID 10 (RAID 1+0)Data striped across mirrored pairs

RAID 01 (RAID 0+1)Mirroring two RAID 0 arrays

RAID 15, RAID 51

Page 65: FFS, LFS, and RAID

Where Did RAID Look For Performance Improvements?Parallel use of disks

Improve overall delivered bandwidth by getting data from multiple disks

Biggest problem is small write performance

But we know how to deal with small writes . . .

Page 66: FFS, LFS, and RAID

Bonus

Given N disks in RAID 1/10/01/15/51, what is the expected number of disk failures before data loss? (1/2 critique)

Given 1-TB disks and probability p for a bit to fail silently, what is the probability of irrecoverable data loss for RAID 1/5/6/10/01/15/51 after a single disk failure? (1/2 critique)