serverless network file systems overview by joseph thompson

Serverless Network File Systems

Overview by Joseph Thompson

Problem

• Centralized file systems fundamentally limit performance and availability– All reads and writes go through the

centralized server– Increased server performance is expensive

Purpose

• Better performance and scalability

• High availability via redundant data storage

Assumption

• SNFS is only appropriate among machines that communicate over a fast network and that trust each other to enforce security– SNFS generates a significant amount of

network traffic– Security will be covered later

Components of SNFS

• Software RAID

• Log File System (LFS)

• Zebra – Merges RAID and LFS in a distributed

network– Don’t miss my next presentation on Zebra!

• Multiprocessor Cache Consistency– In this model, a each processor is one client

Three Problems to Be Solved

• Need distributed metadata which both provide cache consistency management and flexibility to dynamically reconfigure client responsibilities

• Scalable way to subset storage servers for efficiency

• Scalable log cleaning

Metadata

• Manager Map

• IMap

• File Directories

• Stripe Group Map

Mangers

• The manager of a file controls two sets of information about it– Cache consistency state– Disk location metadata

Manager Map

• Table that indicates which physical machines mange which groups of index numbers at any given time

• Globally replicates this table to all mangers in system– Table relatively small (10’s of kBytes per

hundreds of clients)– Table rarely changes

IMap

• A file’s imap entry contains the log addresses of the file’s inode– For scalability, imaps are only distributed to

managers who have been assigned to the file

File Directories

• Contains mappings from file names to index numbers– Stored in the file itself– Files created by the client are assigned to the

manager on that machine (if there is one)

• Index Numbers– Used to find the manager who is responsible

for the file

Stripe Group Map Justification

• In a large raid, even large log segments create small write inefficiencies with large RAIDs

• While one client write at is full network bandwidth to one stripe group, another client can do the same with another group

• Smaller segment size make cleaning more efficient

• Stripe groups greatly improve availability– Each group stores its own parity which helps if there

are multiple server failures in different groups

Stripe Group Implementation

• Group ID• Group Members • Current or Obsolete

– Current and Obsolete field is used to increase efficiency relying on the cleaner to eventually move all data to a current group and removing the obsolete group

• Also globally replicated to each client– Small and rarely changes

Cleaning

• Three main tasks– Utilization status– Uses status to decide which segment to clean– Writes blocks from old segment to new

segment

Distributed Utilization

• Assign the burden of maintaining each segment’s utilization status to the client that wrote the segment

• Client stores utilization information in s-files for each stripe group they write to which are written like normal files and can be found by a stripe group leader

Distributed Cleaning

• A stripe group leader (dynamically appointed) initiates cleaning when the number of free segments drops below a threshold value or when the group is idle

• The leader accumulates the s-files for the group and can dynamically assign cleaners from different machines to clean subsections of the stripe group in an efficient manner

Procedure to Read a Block

• Diagram Demystified!

Writing and Cache Consistency

• To write, a client must request a lock from the owning manager which the manger can revoke at any time

• The manger invalidates its cache and updates its cache consistency information

• One implementation uses a client caching lists to invalidate stale client caches and forward read requests to clients with valid cached copies

Recovery and Reconfiguration

• General Recovery Strategy

• Data Structure Recovery

• Storage Server Recovery

• Manager Recovery

• Cleaner Recovery

• Scalability of Recovery

General Recovery Strategy

• LFS has an append only log of every file modification between log segment writes called the delta

• Uses checkpoint recovery and roll-forward

• Unless additional parity servers per stripe group are used, multiple storage servers from a single stripe are unreachable, there can be no full recovery

Data Structure Recovery

• Layered dependence requires the recovery to start with the storage servers, then managers, then cleaner

Storage Server Recovery

• As we have seen with RAID architectures, recovering a single storage server is easy

• Once we do the initial recovery we can use LFS’ delta feature to poll clients for their unwritten changes in the process of rolling forward

Manager Recovery

• Retrieves last known imaps from its last checkpoint written to a storage server

• The manager gathers a consensus of map manager tables from clients in the roll-forward process to set the appropriate changes to data block locations

Cleaner Recovery

• Since s-files are stored like normal files, they will be recovered from the respective storage server

• Then must go through a roll-forward state where it checks the clients for a summary of their modifications to those segments that are more recent

• To avoid clients having to search their logs multiple times they can gather utilization information during the manager recovery process

Scalability of Recovery

• The roll-forward process can generate O(N^2) messages per object using the roll-forward step where N refers to the number of clients, manger, or storage servers

• An optimization is each object only need to contact N lower layer object, and if there is randomization used to reduce the number of concurrent access to a single storage server, each manager can roll-forward in parallel.

Other Information Not Covered Here

• Details of xFS prototype and performance testing

• Extra research to the state of xFS since 1995 when this paper was written

Conclusion

• Paper is valuable– Provides a creative use of new and old ideas

to pioneer a new file system

• Problems– Restrictions on the usability of this system in a

non-secure environment

• Solutions– P2P security solutions we discussed in class

serverless network file systems overview by joseph thompson

Documents

current group

availabilityeach group

file names

file controls

file itselffiles

large log segments

normal files

network trafficsecurity