using dsvm to implement a distributed file system ramon lawrence dept. of computer science...
TRANSCRIPT
Using DSVM to Implement a Distributed File System
Ramon Lawrence
Dept. of Computer Science
Background Work on DFS
Extensive research in the late eighties Research focused on using replication to
improve efficiency of file access Work at Cornell produced a system
called Deceit which allowed default and user specified replication of files
Deceit used a client/server architecture which allowed multiple servers
Distributed Shared Virtual Memory (DSVM)
a global address space accessible by any number of processes distributed across a network
instead of explicit message passing, DSVM processes read and write to the shared memory space, and it is the responsibility of the DSVM manager to insure the information they see is consistent
Why use DSVM?
distribution transparency process designer does not have to code
explicit IPC increased network bandwidth makes
efficiency vs. development cost trade-off more realistic
easier implementation of distributed or parallel algorithms on generic workstations
Treadmarks
commercial implementation of DSVM designed for running parallel algorithms on a
network of workstations implemented as a C++ library
increased portability allows allocation of memory in DSVM region
using malloc() DSVM access is similar to regular dynamic
memory access
Treadmarks (cont.)
provides barriers and locks for synchronization primitives which must be used if the DSVM region is to remain consistent
uses lazy release consistency which guarantees that DSVM is consistent only after a lock acquire
allows multiple-writers of the same page of DSVM
Treadmarks Limitations
all processes accessing the DSVM region must be homogeneous and must be started at the same time
Treadmarks uses UNIX signals to detect access to DSVM which limits its usefulness for system programming as signals interrupt system calls
despite its limitations, still useful for prototype demonstrations
Environment Specification
network domain is a series of interconnected PCs on an Ethernet
an application is assumed to have a unique name across the network
other files can be given a unique name by concatenating the machine name, directory path, and filename
a global name table (GNT) manages the files on the network
Global Name Table
provides a flat name space to identify files distributed across the network
managed by the OS provides a table look-up mechanism to
find file locations by the unique name enforcing the relative path constraint may
allow transparent access to files by application and users
Relative Path Constraint
all files access is done relative to home directories
absolute paths pose problems as they are not the same across machines
applications and users have home directories and all file access should be specified relative to them
most files are location independent
Benefits of a GNT
presents all users with a consistent and identical view of the network independent of their site location all applications appear to be local enhances user familiarity with the system provides a similar transparency to icons in
windowing systems except that the view is defined by the user not by the site location
Benefits of a GNT (cont.)
makes applications more movable instead of reconfiguring all icons or links, just
have to update one table entry files can be moved by a user or the OS without
effecting the views of other users if used with a standardized display
mechanism, application execution also becomes transparent allows for load balancing, replication, etc.
Distributing the GNT
the GNT provides a mechanism for individual sites to find files on the network
the GNT must be accessible by all sites Two architectures:
client/server DSVM
Client/Server Architecture
every machine has a client process which handles user/application requests
one machine has a dedicated server process which stores the GNT and responds to requests from the clients
communication between the clients and server is done using sockets
Client/Server Architecture
Title: nondsvm.figCreator: fig2dev Version 3.1 Patchlevel 2CreationDate: Tue Mar 26 12:45:51 1996
Client/Server Analysis
distinguishable client and server processes explicit communication problem of dividing tasks (e.g. buffering) single point of failure
pessimistic sharing - data is only shared by explicit requests to the server
efficient with a single server as network communication is minimized
DSVM Architecture
the GNT is allocated in DSVM shared by all processes which act as both a client and a server
an update to the GNT by any process is reflecting at all other processes without explicit communication
all communication is handled by the DSVM manager
DSVM Architecture
Title: dsvm.figCreator: fig2dev Version 3.1 Patchlevel 2CreationDate: Tue Mar 26 12:46:15 1996
DSVM Architecture Analysis
optimistic sharing - everything in DSVM is shared by all processes
transparent sharing processes do not know they are sharing the
GNT with other processes communication details are hidden from
implementation of GNT
DSVM Architecture Analysis (cont.)
hidden costs implementation of DSVM still requires
communication to maintain consistency frequent updates and false sharing may be
a problem overhead in determining when a process
accesses the shared memory region
Architectural Differences
the main trade-off is efficiency vs. ease of implementation very similar to object-oriented environment
– costs of encapsulation and implementation transparency vs. generality and simplicity
sharing methodology pessimistic vs. optimistic DSVM provides illusion of isolation similar to
a transaction in a DB system
Architectural Differences
amount of replication DSVM - full replication Client/Server - single point of failure
overhead DSVM - must trap DSVM accesses and
false sharing at the page level Client/Server - communication minimized
Conclusions
DSVM is a higher level IPC protocol like an object is a higher level data structure
DSVM provides an easier programming environment and a standardized mechanism for IPC at the cost of higher communication overhead
increased bandwidth may justify overhead to achieve decrease in development costs