fault isolation and quick recovery in isolation file systems · fault isolation and quick recovery...

171
Fault Isolation and Quick Recovery in Isolation File Systems Lanyue Lu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin - Madison 1

Upload: voduong

Post on 05-Apr-2018

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Fault Isolation and Quick Recovery in Isolation File Systems

Lanyue Lu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau

University of Wisconsin - Madison

1

Page 2: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

File-System Availability Is Critical

2

Page 3: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

File-System Availability Is Critical

Main data access interface➡ desktop, laptop, mobile devices, file servers

2

Page 4: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

File-System Availability Is Critical

Main data access interface➡ desktop, laptop, mobile devices, file servers

A wide range of failures➡ resource allocation, metadata corruption➡ failed I/O operations, incorrect system states

2

Page 5: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

File-System Availability Is Critical

Main data access interface➡ desktop, laptop, mobile devices, file servers

A wide range of failures➡ resource allocation, metadata corruption➡ failed I/O operations, incorrect system states

A small fault can cause global failures➡ e.g., a single bit can impact the whole file system

2

Page 6: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

File-System Availability Is Critical

Main data access interface➡ desktop, laptop, mobile devices, file servers

A wide range of failures➡ resource allocation, metadata corruption➡ failed I/O operations, incorrect system states

A small fault can cause global failures➡ e.g., a single bit can impact the whole file system

Global failures considered harmful➡ read-only, crash

2

Page 7: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Server Virtualization

Hypervisor

Shared file system

Guest virtual machines

VM2VM1 VM3

VMDK1 VMDK2 VMDK3

3

Page 8: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

VM2VM1 VM3

VMDK1 VMDK2 VMDK3

4

Page 9: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

VM2VM1 VM3

VMDK1 VMDK2 VMDK3

e.g., metadata corruption

4

Page 10: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

VM2VM1 VM3

VMDK1 VMDK2 VMDK3

5

Page 11: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

VM2VM1 VM3

VMDK1 VMDK2 VMDK3

e.g., metadata corruption

5

Page 12: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

VM2VM1 VM3

VMDK1 VMDK2 VMDK3ReadOnly

orCrash

All VMsare affected

6

Page 13: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

VM2VM1 VM3

VMDK1 VMDK2 VMDK3ReadOnly

orCrash

All VMsare affected

e.g., metadata corruption

6

Page 14: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Our Solution

7

Page 15: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Our Solution

A new abstraction for fault isolation➡ support multiple independent fault domains➡ protect a group of files for a domain

7

Page 16: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Our Solution

A new abstraction for fault isolation➡ support multiple independent fault domains➡ protect a group of files for a domain

Isolation file systems➡ fine-grained fault isolation➡ quick recovery

7

Page 17: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Introduction

Study of Failure Policies

Isolation File Systems

Challenges

8

Page 18: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Questions to Answer

9

Page 19: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Questions to AnswerWhat global failure policies are used ?

➡ failure types➡ number of each type

9

Page 20: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Questions to AnswerWhat global failure policies are used ?

➡ failure types➡ number of each type

What are the root causes of global failures ?➡ related data structures➡ number of each cause

9

Page 21: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Methodology

10

Page 22: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Methodology

Three major file systems➡ Ext3 (Linux 2.6.32), Ext4 (Linux 2.6.32)➡ Btrfs (Linux 3.8)

10

Page 23: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Methodology

Three major file systems➡ Ext3 (Linux 2.6.32), Ext4 (Linux 2.6.32)➡ Btrfs (Linux 3.8)

Analyze source code➡ identify types of global failures➡ count related error handling functions➡ correlate global failures to data structures

10

Page 24: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Q1: What global failure policies

are used ?

11

Page 25: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global Failure Policies

12

Page 26: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global Failure Policies

Definition➡ a failure which impacts all users of the file system or even the operating system

12

Page 27: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global Failure Policies

Definition➡ a failure which impacts all users of the file system or even the operating system

Read-Only➡ e.g., ext3_error(): ➡ mark file system as read-only➡ abort the journal

12

Page 28: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

read_block_bitmap(...){

  1   bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”);     return NULL;

}}

Read-Only Example

13

Page 29: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

read_block_bitmap(...){

  1   bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”);     return NULL;

}}

Read-Only Example

13

Page 30: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

read_block_bitmap(...){

  1   bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”);     return NULL;

}}

Read-Only Example

13

Page 31: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

read_block_bitmap(...){

  1   bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”);     return NULL;

}}

Read-Only Example

13

Page 32: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

read_block_bitmap(...){

  1   bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”);     return NULL;

}}

Read-Only Example

13

Page 33: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global Failure Policies

Definition➡ a failure which impacts users of the file system or even the operating system

Read-Only➡ e.g., ext3_error(): ➡ mark file system as read-only➡ abort the journal

Crash➡ e.g., BUG(), ASSERT(), panic()➡ crash the file system or operating system

14

Page 34: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

btrfs/disk-io.c, 3.8

open_ctree(...) {

  1   root->node = read_tree_block(...); 2 BUG_ON(!root->node);

Crash Example

15

Page 35: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

btrfs/disk-io.c, 3.8

open_ctree(...) {

  1   root->node = read_tree_block(...); 2 BUG_ON(!root->node);

Crash Example

15

Page 36: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

btrfs/disk-io.c, 3.8

open_ctree(...) {

  1   root->node = read_tree_block(...); 2 BUG_ON(!root->node);

Crash Example

15

Page 37: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

btrfs/disk-io.c, 3.8

open_ctree(...) {

  1   root->node = read_tree_block(...); 2 BUG_ON(!root->node);

Crash Example

15

Page 38: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

0

200

400

600

800

1000Nu

mbe

r of I

nsta

nces

Ext3 Ext4 Btrfs

193

409

829

ReadOnly Crash

16

Page 39: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Read-only and crash are common in modern file systems

Over 67% of global failures will crash the system

17

Page 40: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Q2: What are the root causes

of global failures ?

18

Page 41: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global Failure Causes

19

Page 42: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global Failure Causes

Metadata corruption➡ metadata inconsistency is detected➡ e.g., a block/inode bitmap corruption

19

Page 43: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/dir.c, 2.6.32

ext3_check_dir_entry(...){

1 rlen = ext3_rec_len_from_disk();  2   if (rlen < EXT3_DIR_REC_LEN(1)){ error = “rec_len is too small”; 3     ext3_error(sb, error);

}

Metadata Corruption Example

20

Page 44: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/dir.c, 2.6.32

ext3_check_dir_entry(...){

1 rlen = ext3_rec_len_from_disk();  2   if (rlen < EXT3_DIR_REC_LEN(1)){ error = “rec_len is too small”; 3     ext3_error(sb, error);

}

Metadata Corruption Example

20

Page 45: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/dir.c, 2.6.32

ext3_check_dir_entry(...){

1 rlen = ext3_rec_len_from_disk();  2   if (rlen < EXT3_DIR_REC_LEN(1)){ error = “rec_len is too small”; 3     ext3_error(sb, error);

}

Metadata Corruption Example

20

Page 46: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/dir.c, 2.6.32

ext3_check_dir_entry(...){

1 rlen = ext3_rec_len_from_disk();  2   if (rlen < EXT3_DIR_REC_LEN(1)){ error = “rec_len is too small”; 3     ext3_error(sb, error);

}

Metadata Corruption Example

20

Page 47: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global Failure Causes

Metadata corruption➡ metadata inconsistency is detected➡ e.g., a block/inode bitmap corruption

I/O failure➡ metadata I/O failure and journaling failure➡ e.g., fail to read an inode block

21

Page 48: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext4/namei.c, 2.6.32

empty_dir(...){

  1   bh = ext4_bread(NULL, inode, &err); if (bh && err) 2 EXT4_ERROR_INODE(inode,

“fail to read directory block”);

I/O Failure Example

22

Page 49: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext4/namei.c, 2.6.32

empty_dir(...){

  1   bh = ext4_bread(NULL, inode, &err); if (bh && err) 2 EXT4_ERROR_INODE(inode,

“fail to read directory block”);

I/O Failure Example

22

Page 50: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext4/namei.c, 2.6.32

empty_dir(...){

  1   bh = ext4_bread(NULL, inode, &err); if (bh && err) 2 EXT4_ERROR_INODE(inode,

“fail to read directory block”);

I/O Failure Example

22

Page 51: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext4/namei.c, 2.6.32

empty_dir(...){

  1   bh = ext4_bread(NULL, inode, &err); if (bh && err) 2 EXT4_ERROR_INODE(inode,

“fail to read directory block”);

I/O Failure Example

22

Page 52: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global Failure Causes

Metadata corruption➡ metadata inconsistency is detected➡ e.g., a block/inode bitmap corruption

I/O failure➡ metadata I/O failure and journaling failure➡ e.g., fail to read an inode block

Software bugs➡ unexpected states detected➡ e.g., allocated block is not in a valid range

23

Page 53: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

ext3_rsv_window_add(...){

  1   if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else {

rsv_window_dump(root, 1); 4 BUG();

}

Software Bug Example

24

Page 54: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

ext3_rsv_window_add(...){

  1   if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else {

rsv_window_dump(root, 1); 4 BUG();

}

Software Bug Example

24

Page 55: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

ext3_rsv_window_add(...){

  1   if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else {

rsv_window_dump(root, 1); 4 BUG();

}

Software Bug Example

24

Page 56: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

ext3_rsv_window_add(...){

  1   if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else {

rsv_window_dump(root, 1); 4 BUG();

}

Software Bug Example

24

Page 57: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

ext3/balloc.c, 2.6.32

ext3_rsv_window_add(...){

  1   if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else {

rsv_window_dump(root, 1); 4 BUG();

}

Software Bug Example

24

Page 58: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

0

200

400

600

800

1000

Num

ber o

f Ins

tanc

es

Ext3 Ext4 Btrfs

193

409

829

ReadOnly Pure Crash

Figure 1: Failure Types. This figure shows the failure typesfor each file system. The total number of global failure instancesis on top of each bar.

Ext3 explicitly validates the integrity of metadata inmany places, especially at the I/O boundary when read-ing from disk. For example, Ext3 validates a directory en-try before traversing that directory and Ext3 checks thatthe inode bitmap is in a correct state before allocating anew inode. Unfortunately, as indicated by the MetadataCorruption column, if Ext3 detects a corruption in any ofthese structures, it causes a global failure. The I/O Failurecolumn similarly shows that Ext3 causes global failureswhen an individual I/O request fails. Finally, the SoftwareBugs column shows that there are a significant number ofinternal assertions (such as BUG ON), which are utilized tovalidate file system state at runtime, and these also cause aglobal failure when invoked. We observe that nearly all ofglobal failures in Ext3 are due to problems with metadataand other file system internal state, and not user data.For each data structure, we also check whether it is

shared across different files. As shown in the last col-umn of Table 1, most metadata structures are organized ina shared manner and thus can cause global failures. How-ever, even for local structures, such as indirect blocks, aglobal failure can still occur.

2.3 DiscussionA file is the basic file system abstraction used to store theuser’s data in a logically isolated unit. Users can readfrom and write to a file. Another basic abstraction is adirectory, which maps a file name to the file itself. Filesand directories are usually organized as a directory tree.A namespace holds a logical group of files or direc-

tories. To protect files in a shared environment, differ-ent applications are isolated within separated namespaces.Typical examples include chroot, BSD jail, SolarisZones, and virtual machines.However, these abstractions do not provide any fault

Data Structure MC IOF SB Sharedb-bitmap 2 2 Yesi-bitmap 1 1 Yesinode 1 2 2 Yessuper 1 Yes

dir-entry 4 4 3 Yesgdt 3 2 Yes

indir-blk 1 1 Noxattr 5 2 1 Noblock 5 Yes/Nojournal 3 27 Yes

journal head 31 Yesbuf head 16 Yeshandle 22 9 Yes

transaction 28 Yesrevoke 2 Yesother 1 11 Yes/NoTotal 19 37 137 = 193

Table 1: Global Failure Causes of Ext3. This table showsthe failure causes for Ext3, in terms of data structures, failurecauses and their related numbers. MC: Metadata Corruption;IOF: I/O Failures; SB: Software Bugs; Share: whether thisstructure is shared by multiple files or directories.

isolation within a file system. Files and directoriesonly represent and isolate data logically for applications.Within a file system, different files and directories sharemetadata and system state; when faults are related to theseshared metadata, global failure policies are triggered.Therefore, file system abstractions lack a fine-grained

fault isolation mechanism. Current file systems implicitlyuse a single fault domain; a fault in one file may cause aglobal reaction, thus affecting all clients of the file system.

3 New Abstraction: File PodTo address the problem of inadequate fault isolation in filesystems, we propose a new abstraction, called a file pod,for fine-grained fault isolation in file systems.A file pod is an abstract file system partition that con-

tains a group of files and their related metadata. Each filepod is isolated as an independent fault domain, with itsown failure policy. Any failure related to a file pod onlyaffects itself, not the whole file system. For example, ifmetadata corruption is detected within a file pod and thefailure policy is to remount as read-only, then a Swarmfile system marks only that pod as read-only, without af-fecting other consistent file pods.File pods allow applications to control the failure policy

of their own files and related metadata, instead of lettingthe file system manage the failures globally. Furthermore,this new abstraction supports flexible bindings betweennamespaces and fault domains; thus it can be used in awide range of environments, such as server virtualization

2

Ext3

25

Page 59: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

0

200

400

600

800

1000

Num

ber o

f Ins

tanc

es

Ext3 Ext4 Btrfs

193

409

829

ReadOnly Pure Crash

Figure 1: Failure Types. This figure shows the failure typesfor each file system. The total number of global failure instancesis on top of each bar.

Ext3 explicitly validates the integrity of metadata inmany places, especially at the I/O boundary when read-ing from disk. For example, Ext3 validates a directory en-try before traversing that directory and Ext3 checks thatthe inode bitmap is in a correct state before allocating anew inode. Unfortunately, as indicated by the MetadataCorruption column, if Ext3 detects a corruption in any ofthese structures, it causes a global failure. The I/O Failurecolumn similarly shows that Ext3 causes global failureswhen an individual I/O request fails. Finally, the SoftwareBugs column shows that there are a significant number ofinternal assertions (such as BUG ON), which are utilized tovalidate file system state at runtime, and these also cause aglobal failure when invoked. We observe that nearly all ofglobal failures in Ext3 are due to problems with metadataand other file system internal state, and not user data.For each data structure, we also check whether it is

shared across different files. As shown in the last col-umn of Table 1, most metadata structures are organized ina shared manner and thus can cause global failures. How-ever, even for local structures, such as indirect blocks, aglobal failure can still occur.

2.3 DiscussionA file is the basic file system abstraction used to store theuser’s data in a logically isolated unit. Users can readfrom and write to a file. Another basic abstraction is adirectory, which maps a file name to the file itself. Filesand directories are usually organized as a directory tree.A namespace holds a logical group of files or direc-

tories. To protect files in a shared environment, differ-ent applications are isolated within separated namespaces.Typical examples include chroot, BSD jail, SolarisZones, and virtual machines.However, these abstractions do not provide any fault

Data Structure MC IOF SB Sharedb-bitmap 2 2 Yesi-bitmap 1 1 Yesinode 1 2 2 Yessuper 1 Yes

dir-entry 4 4 3 Yesgdt 3 2 Yes

indir-blk 1 1 Noxattr 5 2 1 Noblock 5 Yes/Nojournal 3 27 Yes

journal head 31 Yesbuf head 16 Yeshandle 22 9 Yes

transaction 28 Yesrevoke 2 Yesother 1 11 Yes/NoTotal 19 37 137 = 193

Table 1: Global Failure Causes of Ext3. This table showsthe failure causes for Ext3, in terms of data structures, failurecauses and their related numbers. MC: Metadata Corruption;IOF: I/O Failures; SB: Software Bugs; Share: whether thisstructure is shared by multiple files or directories.

isolation within a file system. Files and directoriesonly represent and isolate data logically for applications.Within a file system, different files and directories sharemetadata and system state; when faults are related to theseshared metadata, global failure policies are triggered.Therefore, file system abstractions lack a fine-grained

fault isolation mechanism. Current file systems implicitlyuse a single fault domain; a fault in one file may cause aglobal reaction, thus affecting all clients of the file system.

3 New Abstraction: File PodTo address the problem of inadequate fault isolation in filesystems, we propose a new abstraction, called a file pod,for fine-grained fault isolation in file systems.A file pod is an abstract file system partition that con-

tains a group of files and their related metadata. Each filepod is isolated as an independent fault domain, with itsown failure policy. Any failure related to a file pod onlyaffects itself, not the whole file system. For example, ifmetadata corruption is detected within a file pod and thefailure policy is to remount as read-only, then a Swarmfile system marks only that pod as read-only, without af-fecting other consistent file pods.File pods allow applications to control the failure policy

of their own files and related metadata, instead of lettingthe file system manage the failures globally. Furthermore,this new abstraction supports flexible bindings betweennamespaces and fault domains; thus it can be used in awide range of environments, such as server virtualization

2

Ext3

25

Page 60: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

0

200

400

600

800

1000

Num

ber o

f Ins

tanc

es

Ext3 Ext4 Btrfs

193

409

829

ReadOnly Pure Crash

Figure 1: Failure Types. This figure shows the failure typesfor each file system. The total number of global failure instancesis on top of each bar.

Ext3 explicitly validates the integrity of metadata inmany places, especially at the I/O boundary when read-ing from disk. For example, Ext3 validates a directory en-try before traversing that directory and Ext3 checks thatthe inode bitmap is in a correct state before allocating anew inode. Unfortunately, as indicated by the MetadataCorruption column, if Ext3 detects a corruption in any ofthese structures, it causes a global failure. The I/O Failurecolumn similarly shows that Ext3 causes global failureswhen an individual I/O request fails. Finally, the SoftwareBugs column shows that there are a significant number ofinternal assertions (such as BUG ON), which are utilized tovalidate file system state at runtime, and these also cause aglobal failure when invoked. We observe that nearly all ofglobal failures in Ext3 are due to problems with metadataand other file system internal state, and not user data.For each data structure, we also check whether it is

shared across different files. As shown in the last col-umn of Table 1, most metadata structures are organized ina shared manner and thus can cause global failures. How-ever, even for local structures, such as indirect blocks, aglobal failure can still occur.

2.3 DiscussionA file is the basic file system abstraction used to store theuser’s data in a logically isolated unit. Users can readfrom and write to a file. Another basic abstraction is adirectory, which maps a file name to the file itself. Filesand directories are usually organized as a directory tree.A namespace holds a logical group of files or direc-

tories. To protect files in a shared environment, differ-ent applications are isolated within separated namespaces.Typical examples include chroot, BSD jail, SolarisZones, and virtual machines.However, these abstractions do not provide any fault

Data Structure MC IOF SB Sharedb-bitmap 2 2 Yesi-bitmap 1 1 Yesinode 1 2 2 Yessuper 1 Yes

dir-entry 4 4 3 Yesgdt 3 2 Yes

indir-blk 1 1 Noxattr 5 2 1 Noblock 5 Yes/Nojournal 3 27 Yes

journal head 31 Yesbuf head 16 Yeshandle 22 9 Yes

transaction 28 Yesrevoke 2 Yesother 1 11 Yes/NoTotal 19 37 137 = 193

Table 1: Global Failure Causes of Ext3. This table showsthe failure causes for Ext3, in terms of data structures, failurecauses and their related numbers. MC: Metadata Corruption;IOF: I/O Failures; SB: Software Bugs; Share: whether thisstructure is shared by multiple files or directories.

isolation within a file system. Files and directoriesonly represent and isolate data logically for applications.Within a file system, different files and directories sharemetadata and system state; when faults are related to theseshared metadata, global failure policies are triggered.Therefore, file system abstractions lack a fine-grained

fault isolation mechanism. Current file systems implicitlyuse a single fault domain; a fault in one file may cause aglobal reaction, thus affecting all clients of the file system.

3 New Abstraction: File PodTo address the problem of inadequate fault isolation in filesystems, we propose a new abstraction, called a file pod,for fine-grained fault isolation in file systems.A file pod is an abstract file system partition that con-

tains a group of files and their related metadata. Each filepod is isolated as an independent fault domain, with itsown failure policy. Any failure related to a file pod onlyaffects itself, not the whole file system. For example, ifmetadata corruption is detected within a file pod and thefailure policy is to remount as read-only, then a Swarmfile system marks only that pod as read-only, without af-fecting other consistent file pods.File pods allow applications to control the failure policy

of their own files and related metadata, instead of lettingthe file system manage the failures globally. Furthermore,this new abstraction supports flexible bindings betweennamespaces and fault domains; thus it can be used in awide range of environments, such as server virtualization

2

Ext3

25

Page 61: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

0

200

400

600

800

1000

Num

ber o

f Ins

tanc

es

Ext3 Ext4 Btrfs

193

409

829

ReadOnly Pure Crash

Figure 1: Failure Types. This figure shows the failure typesfor each file system. The total number of global failure instancesis on top of each bar.

Ext3 explicitly validates the integrity of metadata inmany places, especially at the I/O boundary when read-ing from disk. For example, Ext3 validates a directory en-try before traversing that directory and Ext3 checks thatthe inode bitmap is in a correct state before allocating anew inode. Unfortunately, as indicated by the MetadataCorruption column, if Ext3 detects a corruption in any ofthese structures, it causes a global failure. The I/O Failurecolumn similarly shows that Ext3 causes global failureswhen an individual I/O request fails. Finally, the SoftwareBugs column shows that there are a significant number ofinternal assertions (such as BUG ON), which are utilized tovalidate file system state at runtime, and these also cause aglobal failure when invoked. We observe that nearly all ofglobal failures in Ext3 are due to problems with metadataand other file system internal state, and not user data.For each data structure, we also check whether it is

shared across different files. As shown in the last col-umn of Table 1, most metadata structures are organized ina shared manner and thus can cause global failures. How-ever, even for local structures, such as indirect blocks, aglobal failure can still occur.

2.3 DiscussionA file is the basic file system abstraction used to store theuser’s data in a logically isolated unit. Users can readfrom and write to a file. Another basic abstraction is adirectory, which maps a file name to the file itself. Filesand directories are usually organized as a directory tree.A namespace holds a logical group of files or direc-

tories. To protect files in a shared environment, differ-ent applications are isolated within separated namespaces.Typical examples include chroot, BSD jail, SolarisZones, and virtual machines.However, these abstractions do not provide any fault

Data Structure MC IOF SB Sharedb-bitmap 2 2 Yesi-bitmap 1 1 Yesinode 1 2 2 Yessuper 1 Yes

dir-entry 4 4 3 Yesgdt 3 2 Yes

indir-blk 1 1 Noxattr 5 2 1 Noblock 5 Yes/Nojournal 3 27 Yes

journal head 31 Yesbuf head 16 Yeshandle 22 9 Yes

transaction 28 Yesrevoke 2 Yesother 1 11 Yes/NoTotal 19 37 137 = 193

Table 1: Global Failure Causes of Ext3. This table showsthe failure causes for Ext3, in terms of data structures, failurecauses and their related numbers. MC: Metadata Corruption;IOF: I/O Failures; SB: Software Bugs; Share: whether thisstructure is shared by multiple files or directories.

isolation within a file system. Files and directoriesonly represent and isolate data logically for applications.Within a file system, different files and directories sharemetadata and system state; when faults are related to theseshared metadata, global failure policies are triggered.Therefore, file system abstractions lack a fine-grained

fault isolation mechanism. Current file systems implicitlyuse a single fault domain; a fault in one file may cause aglobal reaction, thus affecting all clients of the file system.

3 New Abstraction: File PodTo address the problem of inadequate fault isolation in filesystems, we propose a new abstraction, called a file pod,for fine-grained fault isolation in file systems.A file pod is an abstract file system partition that con-

tains a group of files and their related metadata. Each filepod is isolated as an independent fault domain, with itsown failure policy. Any failure related to a file pod onlyaffects itself, not the whole file system. For example, ifmetadata corruption is detected within a file pod and thefailure policy is to remount as read-only, then a Swarmfile system marks only that pod as read-only, without af-fecting other consistent file pods.File pods allow applications to control the failure policy

of their own files and related metadata, instead of lettingthe file system manage the failures globally. Furthermore,this new abstraction supports flexible bindings betweennamespaces and fault domains; thus it can be used in awide range of environments, such as server virtualization

2

Ext3

25

Page 62: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

0

200

400

600

800

1000

Num

ber o

f Ins

tanc

es

Ext3 Ext4 Btrfs

193

409

829

ReadOnly Pure Crash

Figure 1: Failure Types. This figure shows the failure typesfor each file system. The total number of global failure instancesis on top of each bar.

Ext3 explicitly validates the integrity of metadata inmany places, especially at the I/O boundary when read-ing from disk. For example, Ext3 validates a directory en-try before traversing that directory and Ext3 checks thatthe inode bitmap is in a correct state before allocating anew inode. Unfortunately, as indicated by the MetadataCorruption column, if Ext3 detects a corruption in any ofthese structures, it causes a global failure. The I/O Failurecolumn similarly shows that Ext3 causes global failureswhen an individual I/O request fails. Finally, the SoftwareBugs column shows that there are a significant number ofinternal assertions (such as BUG ON), which are utilized tovalidate file system state at runtime, and these also cause aglobal failure when invoked. We observe that nearly all ofglobal failures in Ext3 are due to problems with metadataand other file system internal state, and not user data.For each data structure, we also check whether it is

shared across different files. As shown in the last col-umn of Table 1, most metadata structures are organized ina shared manner and thus can cause global failures. How-ever, even for local structures, such as indirect blocks, aglobal failure can still occur.

2.3 DiscussionA file is the basic file system abstraction used to store theuser’s data in a logically isolated unit. Users can readfrom and write to a file. Another basic abstraction is adirectory, which maps a file name to the file itself. Filesand directories are usually organized as a directory tree.A namespace holds a logical group of files or direc-

tories. To protect files in a shared environment, differ-ent applications are isolated within separated namespaces.Typical examples include chroot, BSD jail, SolarisZones, and virtual machines.However, these abstractions do not provide any fault

Data Structure MC IOF SB Sharedb-bitmap 2 2 Yesi-bitmap 1 1 Yesinode 1 2 2 Yessuper 1 Yes

dir-entry 4 4 3 Yesgdt 3 2 Yes

indir-blk 1 1 Noxattr 5 2 1 Noblock 5 Yes/Nojournal 3 27 Yes

journal head 31 Yesbuf head 16 Yeshandle 22 9 Yes

transaction 28 Yesrevoke 2 Yesother 1 11 Yes/NoTotal 19 37 137 = 193

Table 1: Global Failure Causes of Ext3. This table showsthe failure causes for Ext3, in terms of data structures, failurecauses and their related numbers. MC: Metadata Corruption;IOF: I/O Failures; SB: Software Bugs; Share: whether thisstructure is shared by multiple files or directories.

isolation within a file system. Files and directoriesonly represent and isolate data logically for applications.Within a file system, different files and directories sharemetadata and system state; when faults are related to theseshared metadata, global failure policies are triggered.Therefore, file system abstractions lack a fine-grained

fault isolation mechanism. Current file systems implicitlyuse a single fault domain; a fault in one file may cause aglobal reaction, thus affecting all clients of the file system.

3 New Abstraction: File PodTo address the problem of inadequate fault isolation in filesystems, we propose a new abstraction, called a file pod,for fine-grained fault isolation in file systems.A file pod is an abstract file system partition that con-

tains a group of files and their related metadata. Each filepod is isolated as an independent fault domain, with itsown failure policy. Any failure related to a file pod onlyaffects itself, not the whole file system. For example, ifmetadata corruption is detected within a file pod and thefailure policy is to remount as read-only, then a Swarmfile system marks only that pod as read-only, without af-fecting other consistent file pods.File pods allow applications to control the failure policy

of their own files and related metadata, instead of lettingthe file system manage the failures globally. Furthermore,this new abstraction supports flexible bindings betweennamespaces and fault domains; thus it can be used in awide range of environments, such as server virtualization

2

Ext3

25

Page 63: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

0

200

400

600

800

1000

Num

ber o

f Ins

tanc

es

Ext3 Ext4 Btrfs

193

409

829

ReadOnly Pure Crash

Figure 1: Failure Types. This figure shows the failure typesfor each file system. The total number of global failure instancesis on top of each bar.

Ext3 explicitly validates the integrity of metadata inmany places, especially at the I/O boundary when read-ing from disk. For example, Ext3 validates a directory en-try before traversing that directory and Ext3 checks thatthe inode bitmap is in a correct state before allocating anew inode. Unfortunately, as indicated by the MetadataCorruption column, if Ext3 detects a corruption in any ofthese structures, it causes a global failure. The I/O Failurecolumn similarly shows that Ext3 causes global failureswhen an individual I/O request fails. Finally, the SoftwareBugs column shows that there are a significant number ofinternal assertions (such as BUG ON), which are utilized tovalidate file system state at runtime, and these also cause aglobal failure when invoked. We observe that nearly all ofglobal failures in Ext3 are due to problems with metadataand other file system internal state, and not user data.For each data structure, we also check whether it is

shared across different files. As shown in the last col-umn of Table 1, most metadata structures are organized ina shared manner and thus can cause global failures. How-ever, even for local structures, such as indirect blocks, aglobal failure can still occur.

2.3 DiscussionA file is the basic file system abstraction used to store theuser’s data in a logically isolated unit. Users can readfrom and write to a file. Another basic abstraction is adirectory, which maps a file name to the file itself. Filesand directories are usually organized as a directory tree.A namespace holds a logical group of files or direc-

tories. To protect files in a shared environment, differ-ent applications are isolated within separated namespaces.Typical examples include chroot, BSD jail, SolarisZones, and virtual machines.However, these abstractions do not provide any fault

Data Structure MC IOF SB Sharedb-bitmap 2 2 Yesi-bitmap 1 1 Yesinode 1 2 2 Yessuper 1 Yes

dir-entry 4 4 3 Yesgdt 3 2 Yes

indir-blk 1 1 Noxattr 5 2 1 Noblock 5 Yes/Nojournal 3 27 Yes

journal head 31 Yesbuf head 16 Yeshandle 22 9 Yes

transaction 28 Yesrevoke 2 Yesother 1 11 Yes/NoTotal 19 37 137 = 193

Table 1: Global Failure Causes of Ext3. This table showsthe failure causes for Ext3, in terms of data structures, failurecauses and their related numbers. MC: Metadata Corruption;IOF: I/O Failures; SB: Software Bugs; Share: whether thisstructure is shared by multiple files or directories.

isolation within a file system. Files and directoriesonly represent and isolate data logically for applications.Within a file system, different files and directories sharemetadata and system state; when faults are related to theseshared metadata, global failure policies are triggered.Therefore, file system abstractions lack a fine-grained

fault isolation mechanism. Current file systems implicitlyuse a single fault domain; a fault in one file may cause aglobal reaction, thus affecting all clients of the file system.

3 New Abstraction: File PodTo address the problem of inadequate fault isolation in filesystems, we propose a new abstraction, called a file pod,for fine-grained fault isolation in file systems.A file pod is an abstract file system partition that con-

tains a group of files and their related metadata. Each filepod is isolated as an independent fault domain, with itsown failure policy. Any failure related to a file pod onlyaffects itself, not the whole file system. For example, ifmetadata corruption is detected within a file pod and thefailure policy is to remount as read-only, then a Swarmfile system marks only that pod as read-only, without af-fecting other consistent file pods.File pods allow applications to control the failure policy

of their own files and related metadata, instead of lettingthe file system manage the failures globally. Furthermore,this new abstraction supports flexible bindings betweennamespaces and fault domains; thus it can be used in awide range of environments, such as server virtualization

2

Ext3

25

Page 64: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

0

200

400

600

800

1000

Num

ber o

f Ins

tanc

es

Ext3 Ext4 Btrfs

193

409

829

ReadOnly Pure Crash

Figure 1: Failure Types. This figure shows the failure typesfor each file system. The total number of global failure instancesis on top of each bar.

Ext3 explicitly validates the integrity of metadata inmany places, especially at the I/O boundary when read-ing from disk. For example, Ext3 validates a directory en-try before traversing that directory and Ext3 checks thatthe inode bitmap is in a correct state before allocating anew inode. Unfortunately, as indicated by the MetadataCorruption column, if Ext3 detects a corruption in any ofthese structures, it causes a global failure. The I/O Failurecolumn similarly shows that Ext3 causes global failureswhen an individual I/O request fails. Finally, the SoftwareBugs column shows that there are a significant number ofinternal assertions (such as BUG ON), which are utilized tovalidate file system state at runtime, and these also cause aglobal failure when invoked. We observe that nearly all ofglobal failures in Ext3 are due to problems with metadataand other file system internal state, and not user data.For each data structure, we also check whether it is

shared across different files. As shown in the last col-umn of Table 1, most metadata structures are organized ina shared manner and thus can cause global failures. How-ever, even for local structures, such as indirect blocks, aglobal failure can still occur.

2.3 DiscussionA file is the basic file system abstraction used to store theuser’s data in a logically isolated unit. Users can readfrom and write to a file. Another basic abstraction is adirectory, which maps a file name to the file itself. Filesand directories are usually organized as a directory tree.A namespace holds a logical group of files or direc-

tories. To protect files in a shared environment, differ-ent applications are isolated within separated namespaces.Typical examples include chroot, BSD jail, SolarisZones, and virtual machines.However, these abstractions do not provide any fault

Data Structure MC IOF SB Sharedb-bitmap 2 2 Yesi-bitmap 1 1 Yesinode 1 2 2 Yessuper 1 Yes

dir-entry 4 4 3 Yesgdt 3 2 Yes

indir-blk 1 1 Noxattr 5 2 1 Noblock 5 Yes/Nojournal 3 27 Yes

journal head 31 Yesbuf head 16 Yeshandle 22 9 Yes

transaction 28 Yesrevoke 2 Yesother 1 11 Yes/NoTotal 19 37 137 = 193

Table 1: Global Failure Causes of Ext3. This table showsthe failure causes for Ext3, in terms of data structures, failurecauses and their related numbers. MC: Metadata Corruption;IOF: I/O Failures; SB: Software Bugs; Share: whether thisstructure is shared by multiple files or directories.

isolation within a file system. Files and directoriesonly represent and isolate data logically for applications.Within a file system, different files and directories sharemetadata and system state; when faults are related to theseshared metadata, global failure policies are triggered.Therefore, file system abstractions lack a fine-grained

fault isolation mechanism. Current file systems implicitlyuse a single fault domain; a fault in one file may cause aglobal reaction, thus affecting all clients of the file system.

3 New Abstraction: File PodTo address the problem of inadequate fault isolation in filesystems, we propose a new abstraction, called a file pod,for fine-grained fault isolation in file systems.A file pod is an abstract file system partition that con-

tains a group of files and their related metadata. Each filepod is isolated as an independent fault domain, with itsown failure policy. Any failure related to a file pod onlyaffects itself, not the whole file system. For example, ifmetadata corruption is detected within a file pod and thefailure policy is to remount as read-only, then a Swarmfile system marks only that pod as read-only, without af-fecting other consistent file pods.File pods allow applications to control the failure policy

of their own files and related metadata, instead of lettingthe file system manage the failures globally. Furthermore,this new abstraction supports flexible bindings betweennamespaces and fault domains; thus it can be used in awide range of environments, such as server virtualization

2

Ext3

25

Page 65: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

26

Page 66: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

All global failures are caused by

metadata and system states

26

Page 67: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

All global failures are caused by

metadata and system statesBoth local and shared metadata can cause global failures

26

Page 68: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

All global failures are caused by

metadata and system statesBoth local and shared metadata can cause global failures

26

Page 69: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Not Only Local File Systems

27

Page 70: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Not Only Local File Systems

Shared-disk file systems OCFS2➡ inspired by Ext3 design➡ used in virtualization environment➡ host virtual machine images➡ allow multiple Linux guests to share a file system

27

Page 71: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Not Only Local File Systems

Shared-disk file systems OCFS2➡ inspired by Ext3 design➡ used in virtualization environment➡ host virtual machine images➡ allow multiple Linux guests to share a file system

Global failures are also prevalent➡ a single piece of corrupted metadata can fail the whole file system on multiple nodes !

27

Page 72: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Current Abstractions

28

Page 73: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Current Abstractions

File and directory➡ metadata is shared for different files or directories

28

Page 74: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Current Abstractions

File and directory➡ metadata is shared for different files or directories

Namespace➡ virtual machines, Chroot, BSD jail, Solaris Zones➡ multiple namespaces still share a file system

28

Page 75: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Current Abstractions

File and directory➡ metadata is shared for different files or directories

Namespace➡ virtual machines, Chroot, BSD jail, Solaris Zones➡ multiple namespaces still share a file system

Partitions➡ multiple file systems on separated partitions➡ a single panic on a partition can crash the whole operating system➡ static partitions, dynamic partitions➡ management of many partitions

28

Page 76: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

29

Page 77: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

All files on a file system implicitly share

a single fault domain

29

Page 78: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

All files on a file system implicitly share

a single fault domain

29

Page 79: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

All files on a file system implicitly share

a single fault domain

Current file-system abstractions do not

provide fine-grained fault isolation

29

Page 80: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Introduction

Study of Failure Policies

Isolation File SystemsNew Abstraction

Fault Isolation

Quick Recovery

Preliminary Implementation on Ext3

Challenges

30

Page 81: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Isolation File Systems

31

Page 82: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Isolation File Systems

Fine-grained partitioned➡ files are isolated into separated domains

31

Page 83: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Isolation File Systems

Fine-grained partitioned➡ files are isolated into separated domains

Independent➡ faulty units will not affect healthy units

31

Page 84: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Isolation File Systems

Fine-grained partitioned➡ files are isolated into separated domains

Independent➡ faulty units will not affect healthy units

Fine-grained recovery➡ repair a faulty unit quickly➡ instead of checking the whole file system

31

Page 85: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Isolation File Systems

Fine-grained partitioned➡ files are isolated into separated domains

Independent➡ faulty units will not affect healthy units

Fine-grained recovery➡ repair a faulty unit quickly➡ instead of checking the whole file system

Elastic➡ dynamically grow and shrink its size

31

Page 86: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

New Abstraction

32

Page 87: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

New Abstraction

File Pod➡ an abstract partition➡ contains a group of files and related metadata ➡ an independent fault domain

32

Page 88: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

New Abstraction

File Pod➡ an abstract partition➡ contains a group of files and related metadata ➡ an independent fault domain

Operations➡ create a file pod➡ set / get file pod’s attributes➡ failure policy➡ recovery policy

➡ bind / unbind a file to pod➡ share a file between pods

32

Page 89: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

d1 d2

d4

d3

/

33

Page 90: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

d1 d2

d4

d3

/Pod1 Pod2

34

Page 91: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Introduction

Study of Failure Policies

Isolation File SystemsNew Abstraction

Fault Isolation

Quick Recovery

Preliminary Implementation on Ext3

Challenges

35

Page 92: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Metadata Isolation

36

Page 93: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Metadata Isolation

Observation➡ metadata is organized in a shared manner ➡ hard to isolate a failure for metadata

36

Page 94: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Metadata Isolation

Observation➡ metadata is organized in a shared manner ➡ hard to isolate a failure for metadata

For example➡ multiple inodes are stored in a single inode block

i i i i i i i i i i i i

an inode block36

Page 95: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Metadata Isolation

Observation➡ metadata is organized in a shared manner ➡ hard to isolate a failure for metadata

For example➡ multiple inodes are stored in a single inode block ➡ an I/O failure can affect multiple files

i i i i i i i i i i i i

an inode block

a block read failure

36

Page 96: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

37

Page 97: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Key Idea 1:

37

Page 98: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Key Idea 1:

Isolate metadata for file pods

37

Page 99: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Localize Failures

38

Page 100: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Localize Failures

Local Failures ➡ convert global failures to local failures ➡ same failure semantics➡ only fail the faulty pod

38

Page 101: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Localize Failures

Local Failures ➡ convert global failures to local failures ➡ same failure semantics➡ only fail the faulty pod

Read-Only➡ mark a file pod as Read-Only

38

Page 102: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Localize Failures

Local Failures ➡ convert global failures to local failures ➡ same failure semantics➡ only fail the faulty pod

Read-Only➡ mark a file pod as Read-Only

Crash➡ crash a file pod instead of the whole system➡ provide the same initial states after crash

38

Page 103: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

d1 d2

d4

d3

/Pod1 Pod2

39

Page 104: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

d1 d2

d4

d3

/Pod1 Pod2

e.g., corruption

40

Page 105: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

d1 d2

d4

d3

/Pod1 Pod2

e.g., corruption

40

Page 106: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Introduction

Study of Failure Policies

Isolation File SystemsNew Abstraction

Fault Isolation

Quick Recovery

Preliminary Implementation on Ext3

Challenges

41

Page 107: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Quick Recovery

42

Page 108: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Quick Recovery

File system recovery is slow➡ a small error requires a full check➡ many random read requests➡ 7 hours to sequentially read a 2 TB disk

42

Page 109: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

43

Page 110: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

a small fault

requires a full check(slow!)

43

Page 111: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

a small fault

requires a full check(slow!)

43

Page 112: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

44

Page 113: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Key Idea 2:

44

Page 114: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Key Idea 2:

Minimize the file system checking range during recovery

44

Page 115: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Quick Recovery

45

Page 116: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Quick Recovery

Metadata Isolation➡ file pod as the unit of recovery➡ check and recover independently ➡ both online and offline

45

Page 117: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Quick Recovery

Metadata Isolation➡ file pod as the unit of recovery➡ check and recover independently ➡ both online and offline

When recover ?➡ leverage internal detection mechanism

45

Page 118: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Quick Recovery

Metadata Isolation➡ file pod as the unit of recovery➡ check and recover independently ➡ both online and offline

When recover ?➡ leverage internal detection mechanism

How to recover more efficiently ?➡ only check the faulty pod➡ narrow down to certain data structures

45

Page 119: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Introduction

Study of Failure Policies

Isolation File SystemsNew Abstraction

Fault Isolation

Quick Recovery

Preliminary Implementation on Ext3

Challenges

46

Page 120: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Ext3 Layout

47

Page 121: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Ext3 Layout

A disk is divided into block groups➡ physical partition for disk locality

47

Page 122: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Ext3 Layout

A disk is divided into block groups➡ physical partition for disk locality

disk layout

47

Page 123: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Ext3 Layout

A disk is divided into block groups➡ physical partition for disk locality

SB GDTs BM InodesIM Blocks Blocks

disk layout

one block group

47

Page 124: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

48

Page 125: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

multiple files can share a single block group

48

Page 126: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

multiple files can share a single block group

48

Page 127: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

multiple files can share a single block group

48

Page 128: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

multiple files can share a single block group

48

Page 129: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

multiple files can share a single block group

48

Page 130: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

f5

multiple files can share a single block group

one file can span multiple block groups

48

Page 131: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

f5

multiple files can share a single block group

one file can span multiple block groups

48

Page 132: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

f5

multiple files can share a single block group

one file can span multiple block groups

48

Page 133: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

f5

multiple files can share a single block group

one file can span multiple block groups

48

Page 134: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

SB GDTs BM InodesIM Blocks Blocks

f1

f2

f3

f4

f5

multiple files can share a single block group

one file can span multiple block groups

48

Page 135: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Layout

49

Page 136: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Layout

A file pod contains multiple block groups➡ one block group only maps to one file pod➡ performance locality and fault isolation

49

Page 137: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Layout

A file pod contains multiple block groups➡ one block group only maps to one file pod➡ performance locality and fault isolation

disk layout

POD1 POD2 POD3

49

Page 138: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Data Structures

50

Page 139: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Data Structures

Pod related structure➡ no extra mapping structures

50

Page 140: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Data Structures

Pod related structure➡ no extra mapping structures➡ embeds in group descriptors➡ group descriptors are loaded into memory

SB GDTs BM InodesIM Blocks Blocks

a block grouppod

50

Page 141: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Algorithms

51

Page 142: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Algorithms

Pod based inode and block allocation➡ preserve original allocation’s locality ➡ allocation will not cross pod boundary

51

Page 143: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

POD1 POD2 POD3

52

Page 144: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

POD1 POD2 POD3

1. within the same pod

2. an empty block group

52

Page 145: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Algorithms

53

Page 146: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Algorithms

Pod based inode and block allocation➡ preserve original allocation’s locality ➡ allocation will not cross pod boundary

De-fragmentation➡ potential internal fragmentation

53

Page 147: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Algorithms

Pod based inode and block allocation➡ preserve original allocation’s locality ➡ allocation will not cross pod boundary

De-fragmentation➡ potential internal fragmentation➡ de-fragmentation for file pods ➡ similar solution in Ext4

53

Page 148: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Journaling

54

Page 149: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Journaling

Virtual transaction➡ contains updates only from one pod

T1 T2 T3

Pod 1

On-disk journal

Pod 2 Pod 3

independenttransactions

54

Page 150: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Journaling

Virtual transaction➡ contains updates only from one pod➡ better performance isolation

T1 T2 T3

Pod 1

On-disk journal

Pod 2 Pod 3

independenttransactions

54

Page 151: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Journaling

Virtual transaction➡ contains updates only from one pod➡ better performance isolation➡ commit multiple virtual transactions in parallel

T1 T2 T3

Pod 1

On-disk journal

Pod 2 Pod 3

journal reservation

independenttransactions

shared journal

54

Page 152: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Introduction

Study of Failure Policies

Isolation File SystemsNew Abstraction

Fault Isolation

Quick Recovery

Preliminary Implementation on Ext3

Challenges

55

Page 153: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Status

56

Page 154: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Status

What we did➡ a simple prototype for Ext3➡ provide readonly isolation

56

Page 155: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Status

What we did➡ a simple prototype for Ext3➡ provide readonly isolation

What we plan to do ➡ crash isolation

56

Page 156: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Status

What we did➡ a simple prototype for Ext3➡ provide readonly isolation

What we plan to do ➡ crash isolation➡ quick recovery after failure

56

Page 157: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Status

What we did➡ a simple prototype for Ext3➡ provide readonly isolation

What we plan to do ➡ crash isolation➡ quick recovery after failure➡ other file systems: Ext4 and Btrfs

56

Page 158: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Challenges

57

Page 159: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Challenges

Metadata isolation➡ tree-based directory structure➡ globally shared metadata: super block, journal➡ shared system states: block allocation tree

57

Page 160: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Challenges

Metadata isolation➡ tree-based directory structure➡ globally shared metadata: super block, journal➡ shared system states: block allocation tree

Local failure ➡ is it correct to continue to run ?➡ light-weight, stateless crash for a pod

57

Page 161: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Challenges

Metadata isolation➡ tree-based directory structure➡ globally shared metadata: super block, journal➡ shared system states: block allocation tree

Local failure ➡ is it correct to continue to run ?➡ light-weight, stateless crash for a pod

Performance➡ potential overhead of managing pods➡ better performance isolation ➡ better scalability

57

Page 162: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

58

Page 163: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Failure is not an option.

58

Page 164: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Failure is not an option. -- NASA

58

Page 165: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

59

Page 166: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global failure is not an option;

59

Page 167: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global failure is not an option;

local failure with quick recovery

59

Page 168: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global failure is not an option;

local failure with quick recovery

is an option.

59

Page 169: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Global failure is not an option;

local failure with quick recovery

is an option.

-- Isolation File Systems

59

Page 170: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

60

Page 171: Fault Isolation and Quick Recovery in Isolation File Systems · Fault Isolation and Quick Recovery in Isolation File Systems ... mobile devices, ... even the operating system 12

Questions ?

60