naming, state management, and user-level extensions in .mechanisms are needed to handle this sort
Post on 05-Jun-2018
Embed Size (px)
Naming, State Management, and User-Level Extensions inthe Sprite Distributed File System
Brent Ballinger Welch
This dissertation concerns network computing environments. Advances in networkand microprocessor technology have caused a shift from stand-alone timesharing systemsto networks of powerful personal computers. Operating systems designed for stand-alonetimesharing hosts do not adapt easily to a distributed environment. Resources like diskstorage, printers, and tape drives are not concentrated at a single point. Instead, they arescattered around the network under the control of different hosts. New operating systemmechanisms are needed to handle this sort of distribution so that users and applicationprograms need not worry about the distributed nature of the underlying system.
This dissertation explores the approach of centering a distributed computingenvironment around a shared network file system. The file system is chosen as a startingpoint because it is a heavily used service in stand-alone systems, and the read/write para-digm of the file system is a familiar one that can be applied to many system resources.The file system described in this dissertation provides a distributed name space for sys-tem resources, and it provides remote access facilities so all resources are availablethroughout the network. Resources accessible via the file system include disk storage,other types of peripheral devices, and user-implemented service applications. The result-ing system is one where resources are named and accessed via the shared file system, andthe underlying distribution of the system among a collection of hosts is not important tousers. A single system image is provided where any host is equally usable as another,much like a timesharing system where all the terminals provide equivalent access to thecentral host.
The file system makes a basic distinction between operations on the file systemname space and operations on open I/O streams. The name space is implemented by acollection of file servers, while I/O streams may be connected to objects located on anyhost. Thus, two different servers may be involved in access to any given object, one fornaming the object and one for doing I/O on the object. This creates a flexible systemwhere devices and services can be located on any host, yet the file servers directorystructures still make up the global name space. At the same time, a file server handlesboth naming and I/O operations for regular files in order to optimize this important com-mon case. The distinction between naming and I/O has been made in other systems byintroducing a separate network name service. However, the name services in other sys-tems have been too heavy weight to use for every file. Name services have been used toname hosts, users, and services instead. In contrast, the work here extends the file systemname space to provide access to other services. This achieves the same flexibility pro-vided by a separate name service, but it reuses the directory lookup mechanisms already
present on the file servers and it optimizes the important case of regular file access.
Another novel feature of the Sprite file system concerns distributed state manage-ment. The file system is implemented by a distributed set of computers, and its internalstate is distributed among the operating system kernels at the different sites. Much of thestate is used to optimize file system accesses by using a main-memory caching system.The servers keep state about how their clients are caching files, and they use this state toguarantee a consistent view of the file system data [Nelson88b]. It is important to main-tain this state efficiently so the performance gains of caching are not squandered, yet it isalso important to maintain the state robustly so the system behaves well during serverfailures and network partitions. The dissertation describes how server state is maintainedefficiently during normal operations and how server state is updated when processesmigrate between hosts. This dissertation also describes a recovery protocol for state-ful servers that relies on redundant state on the clients. The redundant state is main-tained cheaply as main-memory data structures (there is no disk logging), and servers canrecover their state after a crash with the help of their clients.
The next section of this introductory chapter gives a little background on the Spriteoperating system project. Then the issues addressed by the file system are considered inmore detail, along with the more specific contributions I make regarding these issues.The chapter ends with an outline of the other chapters and a summary of the thesis.
1.1. The Sprite Operating System Project
The work presented here was done as part of the Sprite operating system project[Ousterhout88]. The project began in 1984 with the goal of designing and developing anoperating system for a network of personal workstations. We felt that new hardwarefeatures changed our computing environment enough to warrant exploration of newoperating system techniques. The new hardware features include multiprocessors, largemain memories, and networks. Of these, my work mainly concerns the network and theproblems associated with integrating a network of workstations into a single system.
Sprite began as part of the SPUR multiprocessor project [Hill86]. The SPURmachine is an example of the new technology we felt demanded a new operating system.It is a multiprocessor with a large physical memory, and it is networked together withmany other powerful personal workstations. While the SPUR machine was being builtby fellow graduate students, our group began development of Sprite on Sun workstations.Today Sprite runs on Sun3 and Sun4 workstations, DECstations, and, of course, mul-tiprocessor SPURs.
Our approach to designing and building Sprite was to begin from scratch. Insteadof modifying an existing operating system such as UNIX1, we decided to start fresh so asto be unfettered by existing design decisions. We were pragmatic, however, and realizedthat we did not have the man-power to rewrite all the applications needed to build ahhhhhhhhhhhhhhhhhhhhhhhhhhh
1 UNIX is a registered trademark of A.T.&T.
useful system. We decided to implement the operating system kernel from scratch, butprovide enough compatibility with UNIX to be able to easily port UNIX applications.We wanted to provide a system that was like a stand-alone timesharing system in termsof ease of use and ease of information sharing, yet provided the additional power of anetwork of high-performance personal workstations. Our target was a system that sup-ported a moderate number of people, perhaps an academic department or a researchlaboratory. Furthermore, our system had to efficiently support diskless workstations,which we favor because they reduce cost, heat, noise, and administrative overhead. Ourgoal of building a real system has been met. Sprite is in daily use by a growing numberof people, and it is used to support continued development of Sprite itself.
There are two features of Sprite that dovetail with its shared file system to provide ahigh-performance system: a high-performance file caching system, and process migra-tion. The caching system keeps frequently accessed data close to the processes that usethem, while the migration system lets processes move to idle workstations. (Thesefeatures are described briefly below.) Furthermore, the stand-alone, timesharingsemantics of the file system are maintained so that the additional performance providedby these features does not impact the ease of use of the system.
Sprite uses main-memory caches of file data to optimize file access. The cachesimprove performance by eliminating disk and network accesses; data is written and readto and from the cache, if possible. Furthermore, a delayed-write policy is used so thatdata can age in the cache before being written back. In a distributed system this cachingstrategy creates a consistency problem. Data may be cached at many sites, and the mostrecent version of a file may not be in the local cache. The algorithm used to maintaincache consistency and the performance benefits of the caching system have beendescribed in Nelsons thesis [Nelson88b] and [Nelson88a]. The caching system isrelevant to this thesis because it creates a state management problem. The state that sup-ports caching has to be managed efficiently so as not to degrade the performance gainedby caching, yet it has to be managed robustly in the face of server crashes.
Sprite provides the ability to migrate actively executing processes between hosts ofidentical CPU architecture. This feature is used to exploit the idle processors that arecommonly found in a network of personal workstations; migration is used to off-loadjobs onto idle hosts. The process migration system will be described in Dougliss thesis[Douglis90], while this dissertation describes the effect that process migration has on thefile system. The shared file system makes process migration easier because data files,programs, and devices are accessible at a migrating processs new site. However, thestate that supports caching and remote I/O streams has to be updated during migration,and this is a tricky problem in distributed state management. Thus, the combination ofdata caching and process migration pose new problems in state management that Iaddress in this dissertation.
1.2. Thesis Overview
This dissertation makes original contributions in the areas of distributed naming,remote device access, user-level extensions, process migration, and failure recovery. A
software architecture that supports these