using dsvm to implement a distributed file system ramon lawrence dept. of computer science...

24
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science [email protected]

Upload: edward-spencer

Post on 17-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Using DSVM to Implement a Distributed File System

Ramon Lawrence

Dept. of Computer Science

[email protected]

Background Work on DFS

Extensive research in the late eighties Research focused on using replication to

improve efficiency of file access Work at Cornell produced a system

called Deceit which allowed default and user specified replication of files

Deceit used a client/server architecture which allowed multiple servers

Distributed Shared Virtual Memory (DSVM)

a global address space accessible by any number of processes distributed across a network

instead of explicit message passing, DSVM processes read and write to the shared memory space, and it is the responsibility of the DSVM manager to insure the information they see is consistent

Why use DSVM?

distribution transparency process designer does not have to code

explicit IPC increased network bandwidth makes

efficiency vs. development cost trade-off more realistic

easier implementation of distributed or parallel algorithms on generic workstations

Treadmarks

commercial implementation of DSVM designed for running parallel algorithms on a

network of workstations implemented as a C++ library

increased portability allows allocation of memory in DSVM region

using malloc() DSVM access is similar to regular dynamic

memory access

Treadmarks (cont.)

provides barriers and locks for synchronization primitives which must be used if the DSVM region is to remain consistent

uses lazy release consistency which guarantees that DSVM is consistent only after a lock acquire

allows multiple-writers of the same page of DSVM

Treadmarks Limitations

all processes accessing the DSVM region must be homogeneous and must be started at the same time

Treadmarks uses UNIX signals to detect access to DSVM which limits its usefulness for system programming as signals interrupt system calls

despite its limitations, still useful for prototype demonstrations

Environment Specification

network domain is a series of interconnected PCs on an Ethernet

an application is assumed to have a unique name across the network

other files can be given a unique name by concatenating the machine name, directory path, and filename

a global name table (GNT) manages the files on the network

Global Name Table

provides a flat name space to identify files distributed across the network

managed by the OS provides a table look-up mechanism to

find file locations by the unique name enforcing the relative path constraint may

allow transparent access to files by application and users

Relative Path Constraint

all files access is done relative to home directories

absolute paths pose problems as they are not the same across machines

applications and users have home directories and all file access should be specified relative to them

most files are location independent

Benefits of a GNT

presents all users with a consistent and identical view of the network independent of their site location all applications appear to be local enhances user familiarity with the system provides a similar transparency to icons in

windowing systems except that the view is defined by the user not by the site location

Benefits of a GNT (cont.)

makes applications more movable instead of reconfiguring all icons or links, just

have to update one table entry files can be moved by a user or the OS without

effecting the views of other users if used with a standardized display

mechanism, application execution also becomes transparent allows for load balancing, replication, etc.

Distributing the GNT

the GNT provides a mechanism for individual sites to find files on the network

the GNT must be accessible by all sites Two architectures:

client/server DSVM

Client/Server Architecture

every machine has a client process which handles user/application requests

one machine has a dedicated server process which stores the GNT and responds to requests from the clients

communication between the clients and server is done using sockets

Client/Server Architecture

Title: nondsvm.figCreator: fig2dev Version 3.1 Patchlevel 2CreationDate: Tue Mar 26 12:45:51 1996

Client/Server Analysis

distinguishable client and server processes explicit communication problem of dividing tasks (e.g. buffering) single point of failure

pessimistic sharing - data is only shared by explicit requests to the server

efficient with a single server as network communication is minimized

DSVM Architecture

the GNT is allocated in DSVM shared by all processes which act as both a client and a server

an update to the GNT by any process is reflecting at all other processes without explicit communication

all communication is handled by the DSVM manager

DSVM Architecture

Title: dsvm.figCreator: fig2dev Version 3.1 Patchlevel 2CreationDate: Tue Mar 26 12:46:15 1996

DSVM Architecture Analysis

optimistic sharing - everything in DSVM is shared by all processes

transparent sharing processes do not know they are sharing the

GNT with other processes communication details are hidden from

implementation of GNT

DSVM Architecture Analysis (cont.)

hidden costs implementation of DSVM still requires

communication to maintain consistency frequent updates and false sharing may be

a problem overhead in determining when a process

accesses the shared memory region

Architectural Differences

the main trade-off is efficiency vs. ease of implementation very similar to object-oriented environment

– costs of encapsulation and implementation transparency vs. generality and simplicity

sharing methodology pessimistic vs. optimistic DSVM provides illusion of isolation similar to

a transaction in a DB system

Architectural Differences

amount of replication DSVM - full replication Client/Server - single point of failure

overhead DSVM - must trap DSVM accesses and

false sharing at the page level Client/Server - communication minimized

Conclusions

DSVM is a higher level IPC protocol like an object is a higher level data structure

DSVM provides an easier programming environment and a standardized mechanism for IPC at the cost of higher communication overhead

increased bandwidth may justify overhead to achieve decrease in development costs

Future Work

expanding the functionality of the GNT defining a display standard to allow for

application execution transparency future work on DSVM including

integration into OS