operating systems ece344 ashvin goel ece university of toronto file systems
TRANSCRIPT
Operating SystemsECE344
Ashvin GoelECE
University of Toronto
File Systems
2
Outline
Overview of file systems
File system design
Sharing files
Unix file system
Consistency and crash recovery
Journaling file systems
Log-structured file systems
3
What is a File System?
A file system provides an abstraction for storing, organizing and accessing persistent datao I.e., data survives after process that created the data has
terminated, and after machines crashes, rebooto This data is stored on disks, tapes, solid-state drives (SSD) …
File-system data is organized as objects called fileso Need a way to find files, so files have names and are
organized as directories
Files are accessed via system callso Files can be accessed concurrently by different processes
4
File Types
The OS typically treats files as an unstructured sequence of bytes
Programs can impose any format on fileso E.g., application programs may look for
specific file extension to indicate the file’s type
However, OS needs to understand the format of executable files to execute programs
Executable file
5
File Metadata
Files have various attributes associated with themo Name, owner, creation time, access permissions, size, etc.o These attributes are called file metadatao File system maintains file metadata in per-file data structures
on disk
6
Basic File-Related Calls
Openo Start using file, set position to beginning of file
Read, Writeo Read/Write n bytes from/to current positiono Update position
Seeko Move to a new positiono Allows random access (mainly for disks, not tape)
Closeo Stop using file
Create, Rename, Delete, Get/Set attributes
7
Directories
Provide a method for naming and locating a file
Store a list of directory entries that point to files
Modern systems use hierarchical directorieso A directory contains files or sub-directorieso E.g., B contains entries for D, j and E
Files are accessed with pathnameso Absolute pathname
E.g., cat /B/D/no Relative pathname
Uses current directory E.g., cd /B/D; cat n
Directory metadata is similar to files
/
ED
CBA
F
G H
i j
m n
o
k l
p q
Interior node
Leaf node
8
Basic Directory-Related Calls
Open
Readdiro Read entries in a directoryo Each entry points to a file or sub-directory
Seekdiro Simulated at user level
Close
Create, Rename, Delete, Get/Set attributeso No Writedir!
Link, Unlinko Add/remove a name for an existing file (more later)
9
Outline
Overview of file systems
File system design
Sharing files
Unix file system
Consistency and crash recovery
Journaling file systems
Log-structured file systems
10
File System Design
OS needs to store and retrieve files and directorieso Needs to maintain information about where they are stored
Needs to store files durably, i.e., ensure that files exist after machine reboot
Needs to handle machine crasheso On a crash, OS stops suddenly, perhaps in the middle of a file
system operationo On restart, the file system should be able to recover data and
bring the file system back to a good or consistent state
11
Disk Blocks
Disks are accessed at the granularity of sectorso Typically, 512 bytes
A file system allocates data in chunks called blockso The file system treats the disk as an array of blockso A block contains 2n contiguous sectorso Reduces overhead of managing individual byteso Large blocks improve throughput but increase internal
fragmentation
12
File System Tasks
A file system performs four main tasks
Free block managemento Allocates blocks to a file, manages free blocks
Uses bitmaps, linked list, B-trees Issues similar to memory, swap management
Block allocation and placemento Maps (potentially non-contiguous) blocks to the file
Issues similar to virtual memory, placement unique to disks
Directory managemento Maps file names to location of starting block of file
Buffer cache managemento Caches disk blocks in memory to minimize I/O
13
Free Block Management - Bitmaps
Keep a bitmap in a separate area on disko 1 bit per disk block
Suppose block size = 4 KB, disk size = 160 GBo Nr. of blocks = 40 Mo Need 40 Mbits = 5 MB disk space => 1280 bitmap blocks
Advantageso Allows allocating contiguous blocks to a file easilyo Need only one bitmap block in memory at a time
Disadvantageso Need extra space for bitmap
14
Block Allocation and Placement
Block allocationo Maps, potentially non-contiguous, blocks to the file
Optionso Contiguous allocationo Linked list allocationo File allocation table (FAT)o I-node based allocation
15
Contiguous Allocation
All blocks in a file are contiguous on the disk
After deleting D, F
16
When to Use Contiguous Allocation
Advantageso Performance is good for sequential reading
Disadvantageso File growth requires copying o Disk becomes fragmented after deletiono Will need periodic compaction
Good for CD-ROMso All file sizes are known in advanceo Files are never deleted
17
Linked List Allocation
Each file is a linked list of blocks
First word in a block contains number of next block
Disadvantageo Random access are slow
18
File Allocation Table (FAT)
Keep linked list information in memory
Uses an index table with one entry per disk blocko Each entry contains the address
of the next block
Advantageso Random access needs in-
memory search (fast)
Disadvantageso Entire table stored in memory,
doesn’t scale with large file systems
End of file marker
19
Inode Based Allocation
Linked list allocation spreads index information on disk, slowing random access
FAT keeps linked-list index information in memory but that limits size of file system
Ideao Store index information for locating file blocks close together
on disko Cache this information in memory when file is openedo This approach avoids the problems above
Problem with the ideao The index information may grow with file growtho It cannot be stored contiguously
20
Inode Based Allocation
Use a tree to store index informationo Tree structure allows growth of index information, without
spreading this information too much
Root of tree is called inode (index node)o Inode is stored on disko There is one inode per file or directory
21
One double indirect pointero Points to a double indirect
block that contains pointers to indirect blocks
One triple indirect pointero Points to a triple indirect block
that contains pointers to double indirect blocks (not shown below)
Inode Structure
Twelve direct block pointerso Point directly to file data blocks
(called direct or data blocks)
One indirect pointero Points to an indirect block that
contains pointers to direct blocks
Why this allocation strategy?
22
Maximum File Size
Say block size is 4 KB
Say block pointer size is 4 byteso So 1024 block pointers per block
Total number of blocks in the fileo 12 direct blockso 1024 blocks via indirect block pointero 1024 * 1024 blocks via double indirect block pointero 1024 * 1024 * 1024 blocks via triple indirect block pointer
Total file sizeo (12 + 1024 + 10242 + 10243) * 4KBo ≈ 2 10*3 * 4 * 210 = 242 = 4 TB
23
Unix File System Layout
Unix File System
24
Block Placement
Block placement is the policy used by file system for block allocation
Original Unix file system had two placement problemso Data blocks allocated randomly in “aging” file systems
Blocks for file allocated sequentially when file system is new As file system fills, blocks are allocated from deleted files Deleted files may be randomly placed So, blocks for new files become scattered across disk
o Inodes allocated far from blocks All inodes at beginning of disk, far from data Traversing file name paths, manipulating files, directories
requires going back and forth from inodes to data blocks
Both of these problems generate many long seeks
25
BSD Fast File System
BSD Unix redesigned Unix FSo New FS called Fast File System (FFS)
Disk partitioned into groups of cylinderso Recall, cylinder is the same track across platterso Cylinder group consists of contiguous cylinders
Placement policy: place these in same cylinder groupo Inode, data blocks in a fileo Files in a directoryo If cylinder group is full,
place in nearby group
26
Directory Management
A directory contains zero or more entrieso One entry per file or sub-directory that resides in the directoryo Directory entries are kept in directory data blocks
Entry maps file names to location of starting block, haso File name, file attributeso Block number of first block of the file
Kernel.C attributes Block Nr.
Kernel.h attributes Block Nr.
os attributes Block Nr.
…
Data blocks
Unix Directories
In Unix, each entry haso (File name, Inode number)o Inode number helps locate i-node of the fileo Inode contains file attributes
28
Unix Directories
Hard linkso More than one name for a fileo Different directory entries have the
same inode numbero /C/F/r points to the same inode
as /B/D/no Inodes maintains reference counto Dag, instead of tree structure
Symbolic links (short cuts)o A file contains data naming
another file (a redirect)o The file contents of /C/F/G/s
are /B/D/m
/
ED
CBA
F
G H
i j
m n
o
k l
p q
Interior node
Leaf node
r
s
29
File Names
Short, Fixed Length Nameso MS-DOS/Windows
FILE3.BAK (8+3) Name has 11 bytes
o Original Unix Name has 14 bytes
30
File Names
Variable Length Names
E.g., Unixo Each name can be 4096 byteso Size of directory entry is variable
Optionso Entries are allocated contiguously
Each entry has length of entry and then name of file name
Fragmentation occurs when files are removed
o Allocate set of pointers to file names in the beginning of the directory
Use heap at the end to store names
31
File Deletion
Directory entry is removed from directory
All blocks in file are returned to free list
Hard Linkso Put a “reference count” field in each inode
Counts number of entries that point to the fileo When removing file from directory, decrement counto When count goes to zero, reclaim all file blocks
Symbolic Linko Remove the real fileo Symbolic link is “broken”
Similar to a bad URL
32
Path Lookup in Unix FS
Say File F located in directory /D1/D2 has to be read
What blocks need to be read from disk?
33
Path Lookup in Unix FS
Say File F located in directory /D1/D2 has to be read
What blocks need to be read from disk?o Super block (provides location of inode blocks area)
Normally this block is read when a file system code performs initialization and this block is cached in memory
o Inode of the / directory (from the inode blocks area)o Data blocks of / directory (provides directory entry for D1)o Inode of the D1 directoryo Data blocks of D1 directory (provides directory entry for D2)o Inode of the D2 directoryo Data blocks of the D2 directory (provides directory entry for F)o Inode of F fileo Data blocks of F File
34
Buffer Cache Management
Notice each file access requires many block accesses
File operations often access the same disk blocko E.g., block containing contents of root (/) directory
Caching disk blocks in memory can reduce disk I/O
Traditionally block cache is called a buffer cache
Cache operationso Block lookup
If block in memory, returns data from buffero Block miss
Read disk block into buffer, update buffer cacheo Block flush
If buffer is modified, write it back to disk block
35
Buffer Cache Organization
Many blocks can be cached in memoryo With 16GB machine, say 8GB for buffer cacheo Block size = 4K, nr. of blocks cached = 2M
Use a hash table to lookup block in memory efficiently
Device Block #
key
Disk blocks in memory
36
Buffer Cache Write Policy
When an application writes to a file, the corresponding block is updated in the buffer cache
When is the disk block updated?o Immediately (synchronously)
Write-through cache Correct, but very slow
o Later (asynchronously) Write-back cache Fast, but what if system crashes? File system can become inconsistent because some blocks in
memory are not on disk We discuss this problem in detail later
37
Buffer Cache Issues
Buffer cache typically has limited size, so we need replacement algorithmso Typically, LRU is used
Buffer cache competes with virtual memory systemo How many frames to allocate for buffer cache vs. virtual
memory?o Some systems use a unified memory cache for buffer cache
and virtual memory pages The blocks of the buffer cache and pages in the page cache are
part of a unified caching scheme However, if a program reads a large file, then it affects programs
that are not accessing files much
38
Read Ahead
Applications often read files sequentially
File system can predict that a process will request a file block after the one that is requesting
File system prefetches next block from disko Also, called read aheado Note that the next block may not be allocated sequentially
If process requests next block, it will be in cacheo Allows operlapping IO with execution
39
Outline
Overview of file systems
File system design
Sharing files
Unix file system
Consistency and crash recovery
Journaling file systems
Log-structured file systems
40
Sharing Files
Files need to be shared across processes
Issueso Concurrent access
What happens when threads read and write a file simultaneously?
o Protection How should the OS ensure that only an authorized user can
access a file?
41
Concurrent Access
OS ensures sequential consistencyo A read() call sees data from most recent finished write() call
(even if write occurred on another processor)o All processors see same order of writes, i.e., if a file block has
value 1, followed by 2, then no processor will read 2, 1
Applications still have to ensure read called after write has finishedo Concurrent accesses may see old and new data
42
Protection
Who (subject) can access a file (object)?
How can they access it (action)?
A protection system dictates whether a given action performed by a given subject on an object is allowedo Actions include, read, write, execute, append, change
protection, delete, etc.
Two mechanisms for enforcing protectiono Access control lists (ACL)o Capabilities
43
Access Control Lists (ACL)
For each object, maintain list of subjects and their permitted actions
Easier to manageo Easy to grant, revoke
Problem when objects are heavily sharedo ACLs become largeo Use groups
44
Capabilities
For each subject, maintain list of objects and their permitted actions
Easier to transfero Like keys, can handoff, does not depend on subject
Revoking capability is challengingo Need to keep track of all subjects that have capability
45
ACLs vs. Capabilities
/one /two /three
Alice rw - rw
Bob w - r
Charlie w r rwSu
bje
cts
Objects
ACL
Capability
46
Outline
Overview of file systems
File system design
Sharing files
Unix file system
Consistency and crash recovery
Journaling file systems
Log-structured file systems
47
The UNIX File-System Data Structures
In memory On disk, cached in memory
48
Example: Unix File-Related System Calls
fd = open(name, mode)o Perform path lookup on name to find inode of fileo Cache inode for file in buffer cacheo Check permissionso Set up entry in open file tableo Set up entry in file descriptor tableo Return fd
byte_count = read(fd, buffer, buffer_size)o Figure out data (and indirect) block(s) to reado Read them from disk into buffer cacheo Copy data to user buffero Update file positiono Return number of bytes read
49
Example: Unix File-Related System Calls
byte_count = write(fd, buffer, num_bytes)o Figure out data (and indirect) block(s) to writeo Read them from disk into buffer cache if the block is being
partially updatedo Copy data from user buffer into buffer cache blockso Update i-nodeo Mark modified buffers, such as inode, free maps, indirect, and
data blocks, as dirtyo Schedule writing dirty buffers to disko Update file positiono Return number of bytes written
close(fd)o Reclaim resources
50
Summary
File systems are designed to store data durably and reliably in file
A file is an abstraction for a disko An application thinks of a file as contiguous byte arrayo File system maps file to non-contiguous blocks
A file system performs four main taskso Free block managemento Block allocation and placemento Directory managemento Buffer cache management
51
Think Time
What is the purpose of directories in a file system?
What operations update directories?
In Unix, the directory hierarchy forms an acyclic graph. Explain how.
How are cycles not allowed in the graph?
Why are they not allowed?
What are the benefits/drawbacks of using inodes in a Unix file system vs. the FAT file system?
Describe the difference between hard and symbolic links in Unix?
52
Think Time
What were the problems in the Unix file system that led to the FFS design?
Describe the operations needed to write the string "xyz" to an existing file "/a/b" in Unix