the buffer cache jeff chase duke university. the kernel syscall trap/returnfault/return...

25
The Buffer Cache Jeff Chase Duke University

Upload: camron-scott

Post on 05-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

The Buffer Cache

Jeff ChaseDuke University

Page 2: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

The kernel

syscall trap/return fault/return

interrupt/return

system call layer: file APIfault entry: VM page faults

I/O completions timer ticks

memory management: block/page cache policy

Page 3: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

DFS

DBufferCache DBuffer

VirtualDisk

startRequest(dbuf, r/w)

ioComplete()

read(), write()startFetch(), startPush()waitValid(), waitClean()

DBuffer dbuf = getBlock(blockID)releaseBlock(dbuf)

create, destroy, read, write a dfilelist dfiles

DeFiler interfaces: overview

Page 4: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Memory Allocation

How should an OS allocate its memory resources among contending demands?– Virtual address spaces: fork, exec, sbrk, page fault.

– The kernel controls how many machine memory frames back the pages of each virtual address space.

– The kernel can take memory away from a VAS at any time.

– The kernel always gets control if a VAS (or rather a thread running within a VAS) asks for more.

– The kernel controls how much machine memory to use as a cache for data blocks whose home is on slow storage.

– Policy choices: which pages or blocks to keep in memory? And which ones to evict from memory to make room for others?

Page 5: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

registerscachesL1/L2

L3

main memory (RAM)

disk, other storage, network RAM

off-core

off-chip

off-module

small and fast

(ns)

big and slow(ms)

Memory/storage hierarchy

• In general, each layer is a cache over the layer below.

– inclusion property

• Technology trends rapid change

• The triangle is expanding vertically bigger gaps, more levels

Terms to knowcache index/directorycache line/entry, associativitycache hit/miss, hit ratiospatial locality of referencetemporal locality of referenceeviction / replacementwrite-through / writebackdirty/clean

Page 6: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page
Page 7: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Memory as a cache

memory(frames)

data

data

virtual address spaces

files and filesystems,databases,

other storage objects

disk and other storagenetwork RAM

page/block read/write accesses

backing storage volumes(pages and blocks)

Processes access external storage objects through file

APIs and VM abstraction. The OS kernel manages caching

of pages/blocks in main memory.

Page 8: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

The Buffer Cache

Memory

Filecache

Proc

Ritchie and Thompson The UNIX Time-Sharing

System, 1974

Page 9: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Editing Ritchie/Thompson

Memory

Filecache

Proc

The system maintains a buffer cache (block cache, file cache) to reduce the number of I/O operations.

Suppose a process makes a system call to access a single byte of a file. UNIX determines the affected disk block, and finds the block if it is resident in the cache. If it is not resident, UNIX allocates a cache buffer and reads the block into the buffer from the disk.

Then, if the op is a write, it replaces the affected byte in the buffer. A buffer with modified data is marked dirty: an entry is made in a list of blocks to be written. The write call may then return. The actual write may not be completed until a later time.

If the op is a read, it picks the requested byte out of the buffer and returns it, leaving the block in the cache.

Page 10: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

DBufferCache DBuffer

read(), write()startFetch(), startPush()waitValid(), waitClean()

DBuffer dbuf = getBlock(blockID)releaseBlock(dbuf)

The DeFiler buffer cache

Device I/O interfaceAsynchronous I/O to/from buffersblock read and writeBlocks numbered by blockIDs

File abstraction implemented in upper DFS layer.All knowledge of how files are laid out on disk is at this layer.Access underlying disk volume through buffer cache API.Obtain buffers (dbufs), write/read to/from buffers, orchestrate I/O.

Page 11: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Page/block cache internalsHASH(blockID)

Each frame/buffer of memory is described by a meta-object (header).

Resident pages or blocks are accessible through through a global hash table.

An ordered list of eviction candidates winds through the hash chains.

Some frames/buffers are free (no valid data). These are on a free list.

Page 12: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

DBufferCache internals

DBufferCache DBuffer

DBuffer dbuf = getBlock(blockID)

I/O cache buffersEach is byte[blocksize]

Buffer headersDBuffer dbuf

There is a one-to-one correspondence of dbufs to buffers.

HASH(blockID) Any given block (blockID) is either resident or not. If resident, then it has exactly one copy (dbuf) in the cache. If it is resident then getBlock finds the dbuf (cache hit).This requires some kind of cache index, e.g., a hash table.

Page 13: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

DBufferCache internals

DBufferCache DBuffer

I/O cache buffersEach is byte[blocksize]

Buffer headersDBuffer dbuf

There is a one-to-one correspondence of dbufs to buffers.

HASH(blockID) If the requested block is not resident, then getBlock allocates a dbuf for the block and places the correct block contents in its buffer (cache miss). If there are no free dbufs in the cache, then we must evict some other block from the cache and reuse its dbuf.

DBuffer dbuf = getBlock(blockID)

Page 14: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Page/block cache internals

HASH(blockID)

cache directory

List(s) of free buffers (bufs) or eviction candidates. These dbufs might be listed in the cache directory if they contain useful data, or not, if they are truly free.

To replace a dbufRemove from free/eviction list.Remove from cache directory.Change dbuf blockID and status.Enter in directory w/ new blockID.Re-register on eviction list.Beware of concurrent accesses.

Page 15: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Dbuffer (dbuf) states

A DBuffer dbuf returned by getBlock is always associated with exactly one block in the disk volume. But it might or might not be “in sync” with the underlying disk contents.

DBuffer

read(…)write(...)startFetch(), startPush()waitValid(), waitClean()

DFS

A dbuf is valid iff it has the “correct” copy of the data. A dbuf is dirty iff it is valid and has an update (a write) that has not yet been written to disk. A valid dbuf is clean if it is not dirty.

Your DeFiler should return only valid data to a client. That may require you to zero the dbuf or fetch data from the disk. Your DeFiler should ensure that all dirty data is eventually pushed to disk.

Page 16: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

DBuffer

startFetch(), startPush()waitValid(), waitClean()

Asynchronous I/O on dbufs

startRequest(dbuf, r/w)

ioComplete()

VirtualDisk

device threads

Device I/O interfaceAsync I/O on dbufs

Start I/O on a dbuf by posting it to a producer/consumer queue for service by a device thread.

Thread upcalls dbuf ioComplete when I/O operation is done.

Client threads may wait on the dbuf for asynchronous I/O to complete.

startFetch(), startPush()

waitValid(), waitClean()

Page 17: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

DFS

DBufferCache DBuffer

VirtualDisk

startRequest(dbuf, r/w); ioComplete()

dbuf = getBlock(blockID)releaseBlock(dbuf)

More dbuf states

Do not evict a dbuf that is in active use (busy)!

A dbuf is pinned if I/O is in progress, i.e., a disk request has started but not yet completed.

A dbuf is held if DFS obtained a reference to the dbuf from getBlock but has not yet released the dbuf.

Page 18: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

read(), write()startFetch(), startPush()waitValid(), waitClean()

DBuffer dbuf = getBlock(blockID)releaseBlock(dbuf)sync()

create, destroy, read, write a dfilelist dfiles

File system layer (DFS)

DBufferCache DBuffer

“inode”

Allocate blocks to files and file metadata.Allocate DFileIDs to files.

Track which blockIDs and DFileIDs are free and which are in use.

Maintain a block map “inode” for each file, as metadata stored on disk.

Page 19: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

A Filesystem On Disk

111000100010110110111101

100110100011000100010101

001011100001100101000100

sector 0

allocationbitmap file

0

rain: 32

hail: 48

0

wind: 18

snow: 62

once upon a time/n in a l

and far far away, lived th

sector 1

directoryfile

Data

Page 20: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

A Filesystem On Disk

111000100010110110111101

100110100011000100010101

001011100001100101000100

sector 0

allocationbitmap file

0

rain: 32

hail: 48

0

wind: 18

snow: 62

once upon a time/n in a l

and far far away, lived th

sector 1

directoryfile

Metadata

Page 21: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

read(), write()startFetch(), startPush()waitValid(), waitClean()

DBuffer dbuf = getBlock(blockID)releaseBlock(dbuf)sync()

create, destroy, read, write a dfilelist dfiles

Managing files

DBufferCache DBuffer

“inode”

Each file has a size: it is the first byte offset in the file that has never been written. Never return data past a file’s size.

Fetch blocks for data and metadata (or zero new ones fresh), read and write in place, and push dirty blocks back to the disk.

Serialize DFS read/write on each inode.

Page 22: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Representing a File On Disk

logicalblock 0

logicalblock 1

logicalblock 2

once upon a time/nin a l

and far far away,/nlived t

he wise and sagewizard.

block mapIndex by logical block number

maps to a blockID

“inode”

file attributes e.g., size

blockIDaccess blocks through the block cache with getBlock, startFetch, waitValid, read,releaseBlock.

Page 23: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Filesystem layout on disk

111000100010110110111101

inode 0bitmap file

0

rain: 32

hail: 48

once upon a time/n in a l

and far far away, lived th

inode 1root directory

inode

file blocks

111000100010110110111101

100110100011000100010101

001011100001100101000100

allocationbitmap file

blocks

0

wind: 18

snow: 62

inode 1root directory

fixed locations on disk

This is a toy example (Nachos).

Page 24: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Filesystem layout on disk

111000100010110110111101

inode 0bitmap file

0

rain: 32

hail: 48

once upon a time/n in a l

and far far away, lived th

inode 1root directory

inode

file blocks

Your DeFiler volume is small. You can keep the free block/inode maps in memory. You don’t need metadata structures on disk for that. But you have to scan the disk to rebuild the in-memory structures on initialization.

X X

XX

DeFiler has no directories.You just need to keep track of which DFileIDs are currently valid, and return a list.

DeFiler must be able to find all valid inodes on disk.

Page 25: The Buffer Cache Jeff Chase Duke University. The kernel syscall trap/returnfault/return interrupt/return system call layer: file API fault entry: VM page

Disk layout: the easy way

once upon a time/n in a l

and far far away, lived th

inode

file blocks

Given a list of valid inodes,you can determine which inodes and blocks are free and which are in use.

DeFiler must be able to find all valid inodes on disk.