operating systems ece344 ashvin goel ece university of toronto file systems

52
Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

Upload: bruce-cook

Post on 18-Jan-2016

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

Operating SystemsECE344

Ashvin GoelECE

University of Toronto

File Systems

Page 2: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

2

Outline

Overview of file systems

File system design

Sharing files

Unix file system

Consistency and crash recovery

Journaling file systems

Log-structured file systems

Page 3: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

3

What is a File System?

A file system provides an abstraction for storing, organizing and accessing persistent datao I.e., data survives after process that created the data has

terminated, and after machines crashes, rebooto This data is stored on disks, tapes, solid-state drives (SSD) …

File-system data is organized as objects called fileso Need a way to find files, so files have names and are

organized as directories

Files are accessed via system callso Files can be accessed concurrently by different processes

Page 4: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

4

File Types

The OS typically treats files as an unstructured sequence of bytes

Programs can impose any format on fileso E.g., application programs may look for

specific file extension to indicate the file’s type

However, OS needs to understand the format of executable files to execute programs

Executable file

Page 5: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

5

File Metadata

Files have various attributes associated with themo Name, owner, creation time, access permissions, size, etc.o These attributes are called file metadatao File system maintains file metadata in per-file data structures

on disk

Page 6: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

6

Basic File-Related Calls

Openo Start using file, set position to beginning of file

Read, Writeo Read/Write n bytes from/to current positiono Update position

Seeko Move to a new positiono Allows random access (mainly for disks, not tape)

Closeo Stop using file

Create, Rename, Delete, Get/Set attributes

Page 7: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

7

Directories

Provide a method for naming and locating a file

Store a list of directory entries that point to files

Modern systems use hierarchical directorieso A directory contains files or sub-directorieso E.g., B contains entries for D, j and E

Files are accessed with pathnameso Absolute pathname

E.g., cat /B/D/no Relative pathname

Uses current directory E.g., cd /B/D; cat n

Directory metadata is similar to files

/

ED

CBA

F

G H

i j

m n

o

k l

p q

Interior node

Leaf node

Page 8: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

8

Basic Directory-Related Calls

Open

Readdiro Read entries in a directoryo Each entry points to a file or sub-directory

Seekdiro Simulated at user level

Close

Create, Rename, Delete, Get/Set attributeso No Writedir!

Link, Unlinko Add/remove a name for an existing file (more later)

Page 9: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

9

Outline

Overview of file systems

File system design

Sharing files

Unix file system

Consistency and crash recovery

Journaling file systems

Log-structured file systems

Page 10: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

10

File System Design

OS needs to store and retrieve files and directorieso Needs to maintain information about where they are stored

Needs to store files durably, i.e., ensure that files exist after machine reboot

Needs to handle machine crasheso On a crash, OS stops suddenly, perhaps in the middle of a file

system operationo On restart, the file system should be able to recover data and

bring the file system back to a good or consistent state

Page 11: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

11

Disk Blocks

Disks are accessed at the granularity of sectorso Typically, 512 bytes

A file system allocates data in chunks called blockso The file system treats the disk as an array of blockso A block contains 2n contiguous sectorso Reduces overhead of managing individual byteso Large blocks improve throughput but increase internal

fragmentation

Page 12: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

12

File System Tasks

A file system performs four main tasks

Free block managemento Allocates blocks to a file, manages free blocks

Uses bitmaps, linked list, B-trees Issues similar to memory, swap management

Block allocation and placemento Maps (potentially non-contiguous) blocks to the file

Issues similar to virtual memory, placement unique to disks

Directory managemento Maps file names to location of starting block of file

Buffer cache managemento Caches disk blocks in memory to minimize I/O

Page 13: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

13

Free Block Management - Bitmaps

Keep a bitmap in a separate area on disko 1 bit per disk block

Suppose block size = 4 KB, disk size = 160 GBo Nr. of blocks = 40 Mo Need 40 Mbits = 5 MB disk space => 1280 bitmap blocks

Advantageso Allows allocating contiguous blocks to a file easilyo Need only one bitmap block in memory at a time

Disadvantageso Need extra space for bitmap

Page 14: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

14

Block Allocation and Placement

Block allocationo Maps, potentially non-contiguous, blocks to the file

Optionso Contiguous allocationo Linked list allocationo File allocation table (FAT)o I-node based allocation

Page 15: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

15

Contiguous Allocation

All blocks in a file are contiguous on the disk

After deleting D, F

Page 16: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

16

When to Use Contiguous Allocation

Advantageso Performance is good for sequential reading

Disadvantageso File growth requires copying o Disk becomes fragmented after deletiono Will need periodic compaction

Good for CD-ROMso All file sizes are known in advanceo Files are never deleted

Page 17: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

17

Linked List Allocation

Each file is a linked list of blocks

First word in a block contains number of next block

Disadvantageo Random access are slow

Page 18: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

18

File Allocation Table (FAT)

Keep linked list information in memory

Uses an index table with one entry per disk blocko Each entry contains the address

of the next block

Advantageso Random access needs in-

memory search (fast)

Disadvantageso Entire table stored in memory,

doesn’t scale with large file systems

End of file marker

Page 19: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

19

Inode Based Allocation

Linked list allocation spreads index information on disk, slowing random access

FAT keeps linked-list index information in memory but that limits size of file system

Ideao Store index information for locating file blocks close together

on disko Cache this information in memory when file is openedo This approach avoids the problems above

Problem with the ideao The index information may grow with file growtho It cannot be stored contiguously

Page 20: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

20

Inode Based Allocation

Use a tree to store index informationo Tree structure allows growth of index information, without

spreading this information too much

Root of tree is called inode (index node)o Inode is stored on disko There is one inode per file or directory

Page 21: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

21

One double indirect pointero Points to a double indirect

block that contains pointers to indirect blocks

One triple indirect pointero Points to a triple indirect block

that contains pointers to double indirect blocks (not shown below)

Inode Structure

Twelve direct block pointerso Point directly to file data blocks

(called direct or data blocks)

One indirect pointero Points to an indirect block that

contains pointers to direct blocks

Why this allocation strategy?

Page 22: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

22

Maximum File Size

Say block size is 4 KB

Say block pointer size is 4 byteso So 1024 block pointers per block

Total number of blocks in the fileo 12 direct blockso 1024 blocks via indirect block pointero 1024 * 1024 blocks via double indirect block pointero 1024 * 1024 * 1024 blocks via triple indirect block pointer

Total file sizeo (12 + 1024 + 10242 + 10243) * 4KBo ≈ 2 10*3 * 4 * 210 = 242 = 4 TB

Page 23: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

23

Unix File System Layout

Unix File System

Page 24: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

24

Block Placement

Block placement is the policy used by file system for block allocation

Original Unix file system had two placement problemso Data blocks allocated randomly in “aging” file systems

Blocks for file allocated sequentially when file system is new As file system fills, blocks are allocated from deleted files Deleted files may be randomly placed So, blocks for new files become scattered across disk

o Inodes allocated far from blocks All inodes at beginning of disk, far from data Traversing file name paths, manipulating files, directories

requires going back and forth from inodes to data blocks

Both of these problems generate many long seeks

Page 25: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

25

BSD Fast File System

BSD Unix redesigned Unix FSo New FS called Fast File System (FFS)

Disk partitioned into groups of cylinderso Recall, cylinder is the same track across platterso Cylinder group consists of contiguous cylinders

Placement policy: place these in same cylinder groupo Inode, data blocks in a fileo Files in a directoryo If cylinder group is full,

place in nearby group

Page 26: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

26

Directory Management

A directory contains zero or more entrieso One entry per file or sub-directory that resides in the directoryo Directory entries are kept in directory data blocks

Entry maps file names to location of starting block, haso File name, file attributeso Block number of first block of the file

Kernel.C attributes Block Nr.

Kernel.h attributes Block Nr.

os attributes Block Nr.

Data blocks

Page 27: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

Unix Directories

In Unix, each entry haso (File name, Inode number)o Inode number helps locate i-node of the fileo Inode contains file attributes

Page 28: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

28

Unix Directories

Hard linkso More than one name for a fileo Different directory entries have the

same inode numbero /C/F/r points to the same inode

as /B/D/no Inodes maintains reference counto Dag, instead of tree structure

Symbolic links (short cuts)o A file contains data naming

another file (a redirect)o The file contents of /C/F/G/s

are /B/D/m

/

ED

CBA

F

G H

i j

m n

o

k l

p q

Interior node

Leaf node

r

s

Page 29: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

29

File Names

Short, Fixed Length Nameso MS-DOS/Windows

FILE3.BAK (8+3) Name has 11 bytes

o Original Unix Name has 14 bytes

Page 30: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

30

File Names

Variable Length Names

E.g., Unixo Each name can be 4096 byteso Size of directory entry is variable

Optionso Entries are allocated contiguously

Each entry has length of entry and then name of file name

Fragmentation occurs when files are removed

o Allocate set of pointers to file names in the beginning of the directory

Use heap at the end to store names

Page 31: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

31

File Deletion

Directory entry is removed from directory

All blocks in file are returned to free list

Hard Linkso Put a “reference count” field in each inode

Counts number of entries that point to the fileo When removing file from directory, decrement counto When count goes to zero, reclaim all file blocks

Symbolic Linko Remove the real fileo Symbolic link is “broken”

Similar to a bad URL

Page 32: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

32

Path Lookup in Unix FS

Say File F located in directory /D1/D2 has to be read

What blocks need to be read from disk?

Page 33: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

33

Path Lookup in Unix FS

Say File F located in directory /D1/D2 has to be read

What blocks need to be read from disk?o Super block (provides location of inode blocks area)

Normally this block is read when a file system code performs initialization and this block is cached in memory

o Inode of the / directory (from the inode blocks area)o Data blocks of / directory (provides directory entry for D1)o Inode of the D1 directoryo Data blocks of D1 directory (provides directory entry for D2)o Inode of the D2 directoryo Data blocks of the D2 directory (provides directory entry for F)o Inode of F fileo Data blocks of F File

Page 34: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

34

Buffer Cache Management

Notice each file access requires many block accesses

File operations often access the same disk blocko E.g., block containing contents of root (/) directory

Caching disk blocks in memory can reduce disk I/O

Traditionally block cache is called a buffer cache

Cache operationso Block lookup

If block in memory, returns data from buffero Block miss

Read disk block into buffer, update buffer cacheo Block flush

If buffer is modified, write it back to disk block

Page 35: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

35

Buffer Cache Organization

Many blocks can be cached in memoryo With 16GB machine, say 8GB for buffer cacheo Block size = 4K, nr. of blocks cached = 2M

Use a hash table to lookup block in memory efficiently

Device Block #

key

Disk blocks in memory

Page 36: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

36

Buffer Cache Write Policy

When an application writes to a file, the corresponding block is updated in the buffer cache

When is the disk block updated?o Immediately (synchronously)

Write-through cache Correct, but very slow

o Later (asynchronously) Write-back cache Fast, but what if system crashes? File system can become inconsistent because some blocks in

memory are not on disk We discuss this problem in detail later

Page 37: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

37

Buffer Cache Issues

Buffer cache typically has limited size, so we need replacement algorithmso Typically, LRU is used

Buffer cache competes with virtual memory systemo How many frames to allocate for buffer cache vs. virtual

memory?o Some systems use a unified memory cache for buffer cache

and virtual memory pages The blocks of the buffer cache and pages in the page cache are

part of a unified caching scheme However, if a program reads a large file, then it affects programs

that are not accessing files much

Page 38: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

38

Read Ahead

Applications often read files sequentially

File system can predict that a process will request a file block after the one that is requesting

File system prefetches next block from disko Also, called read aheado Note that the next block may not be allocated sequentially

If process requests next block, it will be in cacheo Allows operlapping IO with execution

Page 39: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

39

Outline

Overview of file systems

File system design

Sharing files

Unix file system

Consistency and crash recovery

Journaling file systems

Log-structured file systems

Page 40: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

40

Sharing Files

Files need to be shared across processes

Issueso Concurrent access

What happens when threads read and write a file simultaneously?

o Protection How should the OS ensure that only an authorized user can

access a file?

Page 41: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

41

Concurrent Access

OS ensures sequential consistencyo A read() call sees data from most recent finished write() call

(even if write occurred on another processor)o All processors see same order of writes, i.e., if a file block has

value 1, followed by 2, then no processor will read 2, 1

Applications still have to ensure read called after write has finishedo Concurrent accesses may see old and new data

Page 42: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

42

Protection

Who (subject) can access a file (object)?

How can they access it (action)?

A protection system dictates whether a given action performed by a given subject on an object is allowedo Actions include, read, write, execute, append, change

protection, delete, etc.

Two mechanisms for enforcing protectiono Access control lists (ACL)o Capabilities

Page 43: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

43

Access Control Lists (ACL)

For each object, maintain list of subjects and their permitted actions

Easier to manageo Easy to grant, revoke

Problem when objects are heavily sharedo ACLs become largeo Use groups

Page 44: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

44

Capabilities

For each subject, maintain list of objects and their permitted actions

Easier to transfero Like keys, can handoff, does not depend on subject

Revoking capability is challengingo Need to keep track of all subjects that have capability

Page 45: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

45

ACLs vs. Capabilities

/one /two /three

Alice rw - rw

Bob w - r

Charlie w r rwSu

bje

cts

Objects

ACL

Capability

Page 46: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

46

Outline

Overview of file systems

File system design

Sharing files

Unix file system

Consistency and crash recovery

Journaling file systems

Log-structured file systems

Page 47: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

47

The UNIX File-System Data Structures

In memory On disk, cached in memory

Page 48: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

48

Example: Unix File-Related System Calls

fd = open(name, mode)o Perform path lookup on name to find inode of fileo Cache inode for file in buffer cacheo Check permissionso Set up entry in open file tableo Set up entry in file descriptor tableo Return fd

byte_count = read(fd, buffer, buffer_size)o Figure out data (and indirect) block(s) to reado Read them from disk into buffer cacheo Copy data to user buffero Update file positiono Return number of bytes read

Page 49: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

49

Example: Unix File-Related System Calls

byte_count = write(fd, buffer, num_bytes)o Figure out data (and indirect) block(s) to writeo Read them from disk into buffer cache if the block is being

partially updatedo Copy data from user buffer into buffer cache blockso Update i-nodeo Mark modified buffers, such as inode, free maps, indirect, and

data blocks, as dirtyo Schedule writing dirty buffers to disko Update file positiono Return number of bytes written

close(fd)o Reclaim resources

Page 50: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

50

Summary

File systems are designed to store data durably and reliably in file

A file is an abstraction for a disko An application thinks of a file as contiguous byte arrayo File system maps file to non-contiguous blocks

A file system performs four main taskso Free block managemento Block allocation and placemento Directory managemento Buffer cache management

Page 51: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

51

Think Time

What is the purpose of directories in a file system?

What operations update directories?

In Unix, the directory hierarchy forms an acyclic graph. Explain how.

How are cycles not allowed in the graph?

Why are they not allowed?

What are the benefits/drawbacks of using inodes in a Unix file system vs. the FAT file system?

Describe the difference between hard and symbolic links in Unix?

Page 52: Operating Systems ECE344 Ashvin Goel ECE University of Toronto File Systems

52

Think Time

What were the problems in the Unix file system that led to the FFS design?

Describe the operations needed to write the string "xyz" to an existing file "/a/b" in Unix