cs 4284 operating systems...openers, corresponds to on-disk file descriptor 2. on-disk inode –...
Post on 09-Nov-2020
2 Views
Preview:
TRANSCRIPT
CS 4284
Systems Capstone
Godmar Back
Disks & File Systems
Filesystems
CS 4284 Spring 2013
Files vs Disks
File Abstraction
• Byte oriented
• Names
• Access protection
• Consistency
guarantees
Disk Abstraction
• Block oriented
• Block #s
• No protection
• No guarantees
beyond block write
CS 4284 Spring 2013
Filesystem Requirements
• Naming
– Should be flexible, e.g., allow multiple names for
same files
– Support hierarchy for easy of use
• Persistence
– Want to be sure data has been written to disk in case
crash occurs
• Sharing/Protection
– Want to restrict who has access to files
– Want to share files with other users
CS 4284 Spring 2013
FS Requirements (cont’d)
• Speed & Efficiency for different access patterns – Sequential access
– Random access
– Sequential is most common & Random next
– Other pattern is Keyed access (not usually provided by OS)
• Minimum Space Overhead – Disk space needed to store metadata is lost for user data
• Twist: all metadata that is required to do translation must be stored on disk – Translation scheme should minimize number of additional
accesses for a given access pattern
– Harder than, say page tables where we assumed page tables themselves are not subject to paging!
Filesystems
Software Architecture
(including in-memory data
structures)
CS 4284 Spring 2013
Overview
File Operations:
create(), unlink(), open(),
read(), write(), close()
Buffer Cache
Device Driver
File System
• Uses names for files
• Views files as
sequence of bytes
Uses disk id + sector
indices
Must implement translation
(file name, file offset)
(disk id, disk sector, sector offset)
Must manage free space on disk
CS 4284 Spring 2013
The Big Picture
PCB
…
5
4
3
2
1
0
Data structures to keep
track of open files
struct file
inode + position + …
struct dir
inode + position
struct inode
Per-process
file descriptor
table
Bu
ffer C
ach
e
Open file table Filesystem
Information
File Descriptors
(inodes)
Directory
Data
File Data
Cached data and
metadata in buffer
cache On-Disk
Data Structures
?
CS 4284 Spring 2013
Steps in Opening & Reading a File
• Lookup (via directory)
– find on-disk file descriptor’s block number
• Find entry in open file table (struct inode
list in Pintos)
– Create one if none, else increment ref count
• Find where file data is located
– By reading on-disk file descriptor
• Read data & return to user
CS 4284 Spring 2013
Open File Table
• inode – represents file – at most 1 in-memory instance per unique file
– #number of openers & other properties
• file – represents one or more processes using an file – With separate offsets for byte-stream
• dir – represents an open directory file
• Generally: – None of data in OFT is persistent
– Reflects how processes are currently using files
– Lifetime of objects determined by open/close • Reference counting is used
CS 4284 Spring 2013
File Descriptors (“inodes”)
• Term “inode” can refer to 3 things: 1. in-memory inode
– Store information about an open file, such as how many openers, corresponds to on-disk file descriptor
2. on-disk inode – Region on disk, entry in file descriptor table, that stores
persistent information about a file – who owns it, where to find its data blocks, etc.
3. on-disk inode, when cached in buffer cache – A bytewise copy of 2. in memory
– Q.: Should in-memory inode store a pointer to cached on-disk inode? (Answer: No.)
Filesystems
On-Disk Data Structures and
Allocation Strategies
CS 4284 Spring 2013
Filesystem Information
• Contains “superblock” stores information such as size of entire filesystem, etc.
– Location of file descriptor table & free map
• Free Block Map
– Bitmap used to find free blocks
– Typically cached in memory
• Superblock & free map often replicated in different positions on disk
Free Block Map
0100011110101010101
Super Block
CS 4284 Spring 2013
File Allocation Strategies
• Contiguous allocation
• Linked files
• Indexed files
• Multi-level indexed files
CS 4284 Spring 2013
Contiguous Allocation
• Idea: allocate files in contiguous blocks
• File Descriptor = (first block, length)
• Good sequential & random access
• Problems: – hard to extend files – may require expensive
compaction
– external fragmentation
– analogous to segmentation-based VM
• Pintos’s baseline implementation does this
File A File B
CS 4284 Spring 2013
Linked Files
• Idea: implement linked list – either with variable sized blocks
– or fixed sized blocks (“clusters”)
• Solves fragmentation problem, but now – need lots of seeks for sequential accesses and
random accesses
– unreliable: lose first block, may lose file
• Solution: keep linked list in memory – DOS: FAT File Allocation Table
File A
Part 1
File B
Part 1
File A
Part 2
File B
Part 2
CS 4284 Spring 2013
DOS FAT • FAT stored at beginning of disk & replicated for redundancy
• FAT cached in memory
• Size: n-bit entries, m-bit blocks 2^(m+n) limit – n=12, 16, 28
– m=9 … 15 (0.5KB-32KB)
• As disk size grows, m & n must grow – Growth of n means larger in-memory
table
1 6
2 0
3 5
4 -1
5 7
6 -1
7 11
8 0
9 -1
10 9
11 -1
12 10
Filename Length First Block
“a” 2 1
“b” 4 3
“c” 3 12
“d” 1 4
CS 4284 Spring 2013
DOS FAT Scalability Limits
• FAT-12 uses 12 bit entries, max of 4096 clusters – FAT-16: 65536 clusters, FAT-32 uses 28bits, so
theoretical max of 2^28 (1 Gi) clusters
• Floppy disk, say 1.4MB; FAT-12, 1K clusters, need 1,400 entries, 2 bytes each -> 2.8KB
• Modern disk, say ~500 GB (~2^41 bytes) – At 4 KB cluster size, would need 2^29 entries. Each
entry at 4 bytes, would need 2^31 bytes, or 2GB, RAM just to hold the FAT.
– At 32 KB cluster size, would need only 1/8, but still 256MB RAM to hold FAT; simple operations, such as determining how much space is free on disk, require reading entire FAT
CS 4284 Spring 2013
Blocksize Trade-Offs
• Chart above assumes all files are 2KB in size (observed median file size is about 2KB) – Larger blocks: faster reads (because seeks are amortized & more bytes
per transfer)
– More wastage (2KB file in 32KB block means 15/16th are unused)
• Source: Tanenbaum, Modern Operating Systems
CS 4284 Spring 2013
Indexed Allocation
• Single-index: specify maximum filesize, create index array, then note blocks in index
– Random access ok – one translation step
– Sequential access requires more seeks – depending on contiguous allocation
• Drawback: hard to grow beyond maximum
File A
Part 1
File A
Part 2
File A
Index
File A
Part 3
CS 4284 Spring 2013
Multi-Level Indices
• Used in Unix &
(possibly) Pintos
(P4) 1
2
3
..
N
FLI
SLI
TLI
1
2
index
N
index2
index
index
N+I N+1
N+I+1
index3 index2
Direct
Blocks
Indirect
Block
Double
Indirect
Block
Triple
Indirect
Block index N+I+I2
CS 4284 Spring 2013
34 35 0 1 2 3 4 5 6 7 12 13 14 20 21 27 28
Logical View (Per File) offset in file
Physical View (On Disk) (ignoring other files)
Inode
Data
Index
Index2
sector numbers on disk
CS 4284 Spring 2013
34 35 0 1 2 3 4 5 6 7 12 13 14 20 21 27 28
Logical View (Per File) offset in file
Physical View (On Disk) (ignoring other files)
Inode
Data
Index
Index2
sector numbers on disk
… 5
12
4 3 2 1
… 10 11
9 8 7 6
… -1 -1
34 27 20 13
… 18 19
17 16 15 14
CS 4284 Spring 2013
Multi-Level Indices
• If filesz < N * BLKSIZE, can store all information in direct block array – Biased in favor of small files (ok because most files
are small…)
• Assume index block stores I entries – If filesz < (I + N) * BLKSIZE, 1 indirect block suffices
• Q.: What’s the maximum size before we need triple-indirect block?
• Q.: What’s the per-file overhead (best case, worst case?)
CS 4284 Spring 2013
Extents
• Index-tree based scheme avoids external fragmentation, and is efficient for small files, but incurs relatively high meta-data overhead for large files
• Extents can improve that – store (bnum, length) pair to denote that file occupies blocks [bnum, … , bnum+length-1]
– But complicates offset -> sector translation
– Used in ext4.
CS 4284 Spring 2013
Storing Inodes
• Unix v7, BSD 4.3
• FFS (BSD 4.4)
• Cylindergroups have
superblock+bitmap+inode list+file space
• Try to allocate file & inode in same
cylinder group to improve access locality
I0 I1 I2 I3 I4 ….. Superblock Rest of disk for files & directories
I0 I1 … SB1 Files … I3 I4 ….. Files … I8 I9 ….. Files … SB2 SB3
CGi
CS 4284 Spring 2013
Positioning Inodes
• Putting inodes in fixed place makes finding inodes easier
– Can refer to them simply by inode number
– After crash, there is no ambiguity as to what are inodes vs. what are regular files
• Disadvantage: limits the number of files per filesystem at creation time
– Use “df –ih” on Linux/ext3 to see how many inodes are used/free
Filesystems
Directories and Name Resolution
CS 4284 Spring 2013
Directories
• Need to find file descriptor (inode), given a name
• Approaches: – Single directory (old PCs), Two-level approaches with 1 directory
per user
• Now exclusively hierarchical approaches: – File system forms a tree (or DAG)
• How to tell regular file from directory? – Set a bit in the inode
• Data Structures – Linear list of (inode, name) pairs
– B-Trees that map name -> inode
– Combinations thereof
CS 4284 Spring 2013
Using Linear Lists
• Advantage: (relatively) simple to
implement
• Disadvantages:
– Scan makes lookup (& delete!) really slow for
large directories
– Could cause fragmentation (though not a
problem in practice)
23 multi-oom 15 sample.txt
offset 0
inode #
CS 4284 Spring 2013
Using B+-Trees
• Advantages: – Scalable to large
number of files: in growth, in lookup time
• Disadvantage: – Complex – Overhead for small directories (some filesystems
switch to B+-Tree only for large directories)
• Note: some filesystems use B+-Tree not only for directory files, but for block indexes as well. – HFS’s ‘catalog’ – single B+-Tree that stores inodes +
directories. – Also done in NTFS, XFS & Reiserfs, ZFS, and Btrfs
Source: Wikipedia)
CS 4284 Spring 2013
Absolute Paths
• How to resolve a path name such as “/usr/bin/ls”?
– Split into tokens using “/” separator
– Find inode corresponding to root directory • (how? Use fixed inode # for root)
– (*) Look up “usr” in root directory, find inode
– If not last component in path, check that inode is a directory. Go to (*), looking for next comp
– If last component in path, check inode is of desired type, return
CS 4284 Spring 2013
Name Resolution
• Must have a way to scan an entire directory
without other processes interfering -> need a
“lock” function
– But don’t need to hold lock on /usr when scanning
/usr/bin
• Directories can only be removed if they’re empty
– Requires synchronization also
• Most OS cache translations in “namei” cache –
maps absolute pathnames to inode
– Must keep namei cache consistent if files are deleted
CS 4284 Spring 2013
Current Directory
• Relative pathnames are resolved relative to current directory – Provides default context
– Every process has one in Unix/Pintos
• chdir(2) changes current directory – cd tmp; ls; pwd vs (cd tmp; ls); pwd
• lookup algorithm the same, except starts from current dir – process should keep current directory open
– current directory inherited from parent
CS 4284 Spring 2013
Hard & Soft Links
• Provides aliases (different names) for a file
• Hard links: (Unix: ln) – Two independent directory entries have the same
inode number, refer to same file
– Inode contains a reference count
– Disadvantage: alias only possible with same filesystem
• Soft links: (Unix: ln –s) – Special type of file (noted in inode); content of file is
absolute or relative pathname – stored inside inode instead of direct block list
• Windows: “junctions” & “shortcuts”
top related