2. file system - keio universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · what is a file? • a...

27
SOFTWARE ARCHITECTURE 2. FILE SYSTEM Tatsuya Hagino [email protected] 1 https://vu5.sfc.keio.ac.jp/sa/login.php lecture URL

Upload: others

Post on 19-Jan-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

SOFTWARE ARCHITECTURE

2. FILE SYSTEM Tatsuya Hagino

[email protected]

1

https://vu5.sfc.keio.ac.jp/sa/login.php

lecture URL

Page 2: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Operating System Structure

2

Hardware

Operating System

Application

Bootstrap Device Management

Scheduler Memory Management

File System Process Management

System call processing

Network Management

Page 3: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

What is a File? • A unit to store information in an external storage

• sometimes called dataset

• Characteristics of a file • Information in a file is non-volatile.

• Its content does not disappear when power is off.

• Information in a file is persistent. • It exists forever.

• You can start with what you have left.

• File System • Manage files on an external storage

• Windows → NTFS

• MacOS → HFS

• Unix → UFS

• File related conventions: • file name

• file structure

• file type

• file access method

3

File

File

External Storage

(SSD, HDD, USB)

Page 4: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

File Naming Convention • Each file has a name.

• File name

• Letters used for file name • case insensitive: lower and upper letters are not distinguished.

• case sensitive: lower and upper letters are distinguished.

• Encoding of non alphabet characters (e.g. kanji file name)

• Length of file name • MS-DOS limits 8+3 characters.

• UNIX limits 255 characters.

• File extension • It may express its file type:

• Windows: use it to find associated program.

• Mac: uses to hold file type internally.

• UNIX: depends on applications • Compilers may use it to select programming language.

4

document.docx

file name

file extension

File

file name

Page 5: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

File Structure

• Sequence of bytes • No structure in a file

• User can create arbitrary fields.

• A text file uses LF or CR to separate lines.

• May not be efficient when reading or writing.

• Sequence of records • Consist of fixed length

records.

• For a punch card, each record consists of 80 characters. (i.e. one record = one line)

• Efficiently read and write records.

5

char char char LF

char char char char char LF

char char char char LF

char char LF

char char char char char char

char char char char char char

char char char char char char

char char char char char char

line 1

line 2

line 3

line 4

record 1

record 2

record 3

record 4

vs

Page 6: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

File Type

• Regular File:

• text file or binary file

• Directory:

• Folder

• Manage a set of files

• Character Special File:

• Input/output device

• Serial devices like terminal, printer, network, etc.

• Block Special File:

• Devices with block access (i.e. read/write blocks)

• HDD, SSD, etc.

• File system can be created on it.

6

Page 7: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

File Access Method

• Sequential access • Read/write a file one byte

by one byte sequentially.

• Cannot skip or change the order of reading/writing.

• May rewind it and start from the beginning again.

• Random access • Read/write a file randomly

in any order.

• In UNIX, the next position can be specified using seek system call.

7

one by one

from beginning

to end in any

order

vs

Page 8: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Hierarchical File System • Implemented by allowing a directory to

have other directories inside.

• Implementation of a directory: • Each directory consists of a list of entries.

• Each entry consists of a file name and a pointer to a file. • A file can be pointed from more than one

directory.

• File sharing (hard link)

• Specifying a file • Use path name

• Concatenate file names with separation characters • UNIX → /

• Windows → ¥

• Old Mac OS → :

8

parent directory

regular

file

child directory

regular

file

directory

a.docx

bb.c

sa.tex

xyz

size

owner

access

blocks

file information

file

block

regular

file

File

Page 9: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Path Name • Absolute path name

• Starting from the root directory, sub directory names are listed with / separators.

• Relative path name • Starting from the current working directory

9

/

bin sbin usr etc var home tmp

bin sbin local root hagino

bin sa16

01.pptx 02.pptx

ls

grep

passwd

/home/hagino/sa16/02.pptx

sa16/02.pptx

root directory

current working directory

Page 10: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

File Read/Write Mechanism

• UNIX as an example

• Directly use system calls to read/write a file

• Use standard input/output libraries

10

Application

Standard Input/Output Library

File System related System Call

Buffer Management

Device Driver Hardware HDD

SSD

OS

Page 11: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

File System related System Calls

11

open

read

write

seek

ioctl

close

file name

(path name)

File

Descriptor

read data from the file

write data to the file

data

data

change the position of

next read/write

position

attributes

other operations

end access

start access

Page 12: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

File Descriptor

• A small number returned by open system call • specifies an opened file

• used by read/write

• remembers a position of read/write

• Managed by each process • The meaning of file descriptors is local to each process.

• Even in a single process, if the same file is opened more than onece, different file descriptors are assigned.

• Special file descriptors: • standard input → 0

• standard output → 1

• standard error output → 2

12

Page 13: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Example program of reading a file using system calls

13

#include <fcntl.h>

int main(argc, char *argv[])

{

int fd, n;

char buf[512];

fd = open(argv[1], O_RDONLY);

while ((n = read(fd, buf, 512)) > 0) {

write(1, buf, n);

}

close(fd);

}

Page 14: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Standard Input/Output Library • System call

• Not easy to use • e.g. There is no system call to read one line.

• Inefficient for small usage • System call needs to go though OS protection boundary (user mode to supervise mode).

• Costs much more than calling a library.

• Standard input/output library • Useful interface

• read one line

• read one character

• Optimize system call access • Use buffer

14

Application

Standard input/output library

System call

fopen, fclose, fgetc, fputc, fgets, fputs

open, close, read, write

buffer

Page 15: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

How to Use Standard Input/Output Libraries

15

fopen

fgetc

gets

fputc

fputs

fclose

file name

(path name)

FILE

structure

read one character

read one line

one character

one line

write one character

one character

one line

write one line

end access

start access

fscanf

convert string to value

value

fprintf

value

convert value to string

Page 16: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Example program of reading a file using standard

input/output libraries

16

#include <stdio.h>

int main(argc, char *argv[])

{

FILE fp;

int ch;

fp = fopen(argv[1], "r");

while ((ch = fgetc(fp)) >= 0) {

fputc(ch, stdout);

}

fclose(fp);

}

Page 17: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Implementation of Standard Input/Ouput Library

17

struct FILE {

int fd;

char buf[BUFSIZE];

int size;

int counter;

};

typedef struct FILE FILE;

FILE structure

FILE *fopen(char *path, char *mode) {

FILE *fp;

fp = (FILE *)malloc(sizeof(struct FILE));

fp->fd = open(path, ...);

fp->size = 0;

fp->counter = 0;

return fp;

}

fopen

counter buf

holds

temporal

data

next

read/write

potition

size

System call

Application

FILE structure

OS

Page 18: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Implementation of fgetc

18

int fgetc(FILE *fp) {

if (fp->counter >= fp->size) {

fp->size = read(fp->fd, fp->buf, BUFSIZE);

if (fp->size <= 0) return -1;

fp->counter = 0;

}

return fp->buf[fp->counter++];

}

fgetc

fgetc buf

(1) checks buf

OS

file

(2) read 512 bytes

assume

BUFSIZE = 512

(3) fill buf with

512 bytes (4) returns one

character

If there is data in buf,

no read

Page 19: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Implementation of System Call

19

Process Management Structure 0 1 2 3 4

File Descriptor Table

File

Descriptor

File

Descriptor

File

Descriptor

inode

block

numbers

inode

block

numbers

Block Device

inode (index node)

• For each file

• Manages file information

• owner

• size

• file blocks

File Descriptor

• For each opened file

• Holds read/write position

Page 20: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Implementation of open

• Returns a file descriptor for a given file

• Calls namei to find its inode.

• Finds an empty file descriptor slot and sets a new file structure.

20

int open(char *path, int flags, ...) {

struct inode *ip;

int fd;

struct file *fp;

ip = namei(path);

if (ip) {

fp = create a new file structure;

fp->inode = ip;

fp->offset = 0;

fp->refcount = 1;

fd = unused file descriptor of proc;

proc->file[fd] = fp;

return fd;

}

else return -1;

}

struct file {

struct inode *inode;

long offset;

int refcount;

};

file structure

Page 21: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

namei

• Following the path, checks each directory to fined the

named file.

21

struct inode *namei(char *path) {

struct inode *dp;

if (*path == ’/’) {

dp = proc->root_directory;

path++;

} else dp = proc->current_working_directory;

while (*path) {

name = select next name from path;

dp = lookup(dp, name);

}

return dp;

}

Page 22: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Implementation of File • Each file consists of a list of blocks on a disk (or a block device).

• UNIX uses inode to manage block numbers • FAT file system uses FAT to manage

22

struct inode {

short mode;

int uid

int gid;

long size;

...

int atime;

int mtime;

int ctime;

...

int direct[12];

int indirect[3];

};

inode structure

owner

size

modification time

direct reference block numbers (12 blocks)

indirect block numbers (3 blocks)

inode inode

data

block data block

data

block

Filesystem on

a block device

Page 23: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Direct and Indirect Blocks

• Each inode points 12 direct blocks.

• When one block is 512 bytes, it can hold a file less than 512×12=6KB.

• An indirect block contains a list of blocks.

23

infomation

inode

direct 1

direct 2

direct 12

indirect 1

indirect 2 indirect 3

data

data

data

indirect block data

data

data

data

indirect block

indirect block

data

data

indirect block

Page 24: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Calculation of Block Number • From file position to calculate corresponding block number

• A file descriptor holds a position of read/write and the next read/write access the corresponding block.

24

int balloc(struct inode *ip, long offset) {

struct buf *bp;

int i;

blk = (offset + BLKSIZE - 1) / BLKSIZE;

if (blk < 12) return ip->direct[blk];

blk -= 12;

blocks = BLKSIZE/sizeof(int);

for (i = 0; i < 3; i++) {

if (blk < blocks) break;

blk -= blocks;

blocks *= BLKSIZE/sizeof(int);

}

bp = getblock(ip->indirect[i])

while (i-- > 0) {

blocks /= BLKSIZE/sizeof(int);

bp = getblock(bp->buf[blk/blocks]);

blk %= blocks;

}

return bp->buf[blk];

}

file descriptor file

position

read

write

Page 25: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Buffering • Disk blocks are cached in OS memory.

• Read: • First time: reads data from the disk and put it in the cache.

• Second time: do not access disk but use data in the cache.

• Write: • Do not write data to the disk immediately.

• Write them out all later.

25

Hash table

for buffers

buffer buffer

buffer

recently

used

buffer buffer

struct buf {

struct buf *next, *prev;

struct buf *queue_next, *queue_prev;

struct inode *inode;

int dirty;

int blkno;

char buf[BLKSIZE];

};

struct buf buffers[HASH];

buf structure

Page 26: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Disk Format

• A new disk needs to be formatted.

26

Superblock

File system on a disk

inode

list of unused blocks

data blocks

disk information

inode information

Unused blocks are managed by a list

or a bitmap.

actual data blocks

Page 27: 2. File System - Keio Universityweb.sfc.keio.ac.jp/~hagino/sa16/02.pdf · What is a File? • A unit to store information in an external storage • sometimes called dataset • Characteristics

Summary

• File System

• UNIX file system as an example.

• System call vs Standard Input/Output Library

• Implementation of the library

• File Descriptor

• Implementation of File System

• inode

• Direct and indirect blocks

27