chapter 17 distributed file systems by: amar deo(300532107) ankur patel(300873058)

48
Chapter 17 Distributed File Systems By: Amar Deo(300532107) Ankur Patel(300873058)

Upload: amish

Post on 05-Jan-2016

35 views

Category:

Documents


3 download

DESCRIPTION

Chapter 17 Distributed File Systems By: Amar Deo(300532107) Ankur Patel(300873058). Chapter 16 Distributed-File Systems. Background Naming and Transparency Remote File Access Stateful versus Stateless Service File Replication Example Systems. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Chapter 17Distributed File Systems

By:Amar Deo(300532107)

Ankur Patel(300873058)

Page 2: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Background Naming and Transparency Remote File Access Stateful versus Stateless Service File Replication Example Systems

Chapter 16 Distributed-File Systems

Page 3: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Distributed file system (DFS) – a distributed implementation of the classical time-sharing model of a file system, where multiple users share files and storage resources.

A DFS manages set of dispersed storage devices

Overall storage space managed by a DFS is composed of different, remotely located, smaller storage spaces.

Background

Page 4: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Service – software entity running on one or more machines and providing a particular type of function to clients.

Server – service software running on a single machine.

Client – process that can invoke a service using a set of operations that forms its client interface.

A client interface for a file service is formed by a set of primitive file operations (create, delete, read, write).

Client interface of a DFS should be transparent, i.e., not distinguish between local and remote files.

DFS Structure

Page 5: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Do not have a single data repository DFS may vary from system to system DFS provides multiplicity and autonomy to

clients and servers in the system DFS should appear as a centralized file

system Transparent DFS facilitates user mobility Performance measured by amount of time

needed to satisfy service request

Page 6: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Component Unit: Smallest set of files that can be

stored on a single machine

Page 7: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Naming – mapping between logical and physical objects.

Multilevel mapping – abstraction of a file that hides the details of how and where on the disk the file is actually stored.

A transparent DFS hides the location where in the network the file is stored.

For a file being replicated in several sites, the mapping returns a set of the locations of this file’s replicas; both the existence of multiple copies and their location are hidden.

Naming and Transparency

Page 8: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Operating System Concepts

Location transparency – file name does not reveal the file’s physical storage location.

Location independence – file name does not need to be changed when the file’s physical storage location changes. ◦ Better file abstraction.◦ Promotes sharing the storage space itself.◦ Separates the naming hierarchy from the storage-devices hierarchy.

Naming Structures

Page 9: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Do not support file migration Notion of location independence is

irrelevant for these systems Provides convenient way to share data Sharing storage space is cumbersome AFS supports location independence

Static Location Transparency:

Page 10: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Rely on servers to provide all files Special protocol needed for boot sequence Advantage: Lower cost Greater convenience Disadvantage: Added complexity of boot protocols Performance loss

Diskless Clients

Page 11: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Files named by combination of their host name and local name; guarantees a unique system wide name.

Attach remote directories to local directories, giving the appearance of a coherent directory tree; only previously mounted remote directories can be accessed transparently.

Total integration of the component file systems.◦ A single global name structure spans all the

files in the system.◦ If a server is unavailable, some arbitrary set of

directories on different machines also becomes unavailable.

Naming Schemes — Three Main Approaches

Page 12: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Administrative complexity NFS is the most difficult If server unavailable then directories

become unavailable

Evaluate naming structures

Page 13: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

mapping on a component basis can use replication or local caching for location independence replication

impossible use location-independent-file identifiers use structured names for low level identiers

Implementation Techniques

Page 14: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Remote-Service mechanism can be used

Reduce network traffic by retaining recently accessed disk blocks in a cache, so that repeated accesses to the same information can be handled locally.

Remote File Access

Page 15: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

◦ If needed data not already cached, a copy of data is brought from the server to the user.

◦ Accesses are performed on the cached copy.◦ Files identified with one master copy residing at

the server machine, but copies of (parts of) the file are scattered in different caches.

◦ Cache-consistency problem – keeping the cached copies consistent with the master file.

Basic Caching Scheme

Page 16: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Advantages of disk caches◦ More reliable.◦ Cached data kept on disk are still there during recovery

and don’t need to be fetched again.

Advantages of main-memory caches:◦ Permit workstations to be diskless.◦ Data can be accessed more quickly.◦ Performance speedup in bigger memories.◦ Server caches (used to speed up disk I/O) are in main

memory regardless of where user caches are located; using main-memory caches on the user machine permits a single caching mechanism for servers and users.

Cache Location – Disk vs. Main Memory

Page 17: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Write-through – write data through to disk as soon as they are placed on any cache. Reliable, but poor performance.

Delayed-write – modifications written to the cache and then written through to the server later. Write accesses complete quickly; some data may be overwritten before they are written back, and so need never be written at all.◦ Poor reliability; unwritten data will be lost whenever a user

machine crashes.◦ Variation – scan cache at regular intervals and flush blocks

that have been modified since the last scan.◦ Variation – write-on-close, writes data back to the server

when the file is closed. Best for files that are open for long periods and frequently modified.

Cache Update Policy

Page 18: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)
Page 19: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Is locally cached copy of the data consistent with the master copy?

Client-initiated approach◦ Client initiates a validity check.◦ Server checks whether the local data are

consistent with the master copy.

Server-initiated approach◦ Server records, for each client, the (parts of) files

it caches. ◦ When server detects a potential inconsistency, it

must react.

Consistency

Page 20: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

In caching, many remote accesses handled efficiently by the local cache; most remote accesses will be served as fast as local ones.

Servers are contacted only occasionally in caching (rather than for each access).◦ Reduces server load and network traffic.◦ Enhances potential for scalability.

Remote server method handles every remote access across the network; penalty in network traffic, server load, and performance.

Total network overhead in transmitting big chunks of data (caching) is lower than a series of responses to specific requests (remote-service).

Comparing Caching and Remote Service

Page 21: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Caching is superior in access patterns with infrequent writes. With frequent writes, substantial overhead incurred to overcome cache-consistency problem.

Benefit from caching when execution carried out on machines with either local disks or large main memories.

Remote access on diskless, small-memory-capacity machines should be done through remote-service method.

Caching and Remote Service (Cont.)

Page 22: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Objective

– Stateful vs. Stateless service– File Replication– An Example : AFS

Page 23: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Stateful: A server keeps track of information about

client requests.  

◦ It maintains what files are opened by a client; connection identifiers; server caches.

◦ Memory must be reclaimed when client closes file or when client dies.

Stateful file service

Page 24: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Client opens a file. Server fetches information about the file from its disk,

stores it in its memory, and gives the client a unique connection identifier to the client and the open file.

Identifier is used for subsequent accesses until the session ends.

Server must reclaim the main-memory space used by clients who are no longer active

Stateful File service

Page 25: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Stateless: Each client request provides complete information needed by the server (i.e., filename, file offset ).

◦ The server can maintain information on behalf of the client, but it's not required.

◦ Useful things to keep include file info for the last N files touched.

Avoids state information by making each request self-contained

Each request identifies the file and position in the file No need to establish and terminate a connection by open

and close operations

Stateless File Service

Page 26: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Performance Failure recovery Drawbacks for using robust stateless services Some environment required stateful service

Distinctions Between Stateful and stateless service

Page 27: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Performance is better for stateful.  

◦ Don't need to parse the filename each time, or "open/close" file on every request.

◦ Stateful can have a read-ahead cache.

Performance

Page 28: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

A stateful server loses all its volatile state in a crash

◦ Server needs to be aware of client failures in order to reclaim space allocated to record the state of crashed client processes (orphan detection and elimination)

With stateless server, the effects of server failure send recovery are almost unnoticeable

◦ A newly reincarnated server can respond to a self-contained request without any difficulty

Failure Detection

Page 29: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

longer request messages slower request processing additional constraints imposed on DFS design

Drawbacks for using robust stateless services

Page 30: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

A server employing server-initiated cache validation cannot provide stateless service, since it maintains a record of which files are cached by which clients

UNIX use of file descriptors and implicit offsets is inherently stateful; servers must maintain tables to map the file descriptors to inodes, and store the current offset within a file

Some environment required stateful service

Page 31: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

DFS Replication (DFSR) is a new component that lets you set up multiple "targets" for a share, where the same files are stored on multiple servers and any changes are replicated between the targets.

Duplicating files on multiple machines improves availability and performance.

File Replication

Page 32: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Replicas of the same file reside on failure-independent machines

Improves availability and can shorten service time Naming scheme maps a replicated file name to a particular

replica◦ Existence of replicas should be invisible to higher levels ◦ Replicas must be distinguished from one another by

different lower-level names Updates –replicas of a file denote the same logical entity,

and thus an update to any replica must be reflected on all other replicas

Demand replication –reading a nonlocal replica causes it to be cached locally, thereby generating a new nonprimary replica.

File Replication

Page 33: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

AFS is a distributed file system which allows users to share and access all the files stored in the file system. It is as easy as accessing files stored on their local file system. AFS was created by Carnegie Mellon University (CMU) and IBM

in 1985.

Andrew File System

Page 34: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

OVERVIEW:  AFS tries to solve complex issues such as,

◦ Uniform name space◦ Server-side caching (via replicas), ◦ Client-side caching (with cache consistency),◦ AFS uses local caches to reduce workload and increase

its performance.◦ It is location independent, which means that users do not

need to know which fileserver has the file they want to manipulate.

◦ It allows users to simplify addressing. ◦ Also, AFS improves security by using Kerberos and

access control lists (ACLs).

Andrew File System

Page 35: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

In AFS, file/directory protections are managed in a different way from those of UNIX file systems.

Access Control Lists (ACLs) specify file protections in AFS by giving a permit to each directory, not to individual files.

Users use three commands to view or modify ACLs: fs listacl fs setacl fs cleanacl

What are Access Control Lists (ACLs)

Page 36: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Access rights are by which a user can specify exactly what other people are allowed to access his directory as well as the ACL commands (fs setacl, etc). Each access right is represented as an alphabetical letter.

There are seven access rights which you can specify for user/groups

What are access rights?

Page 37: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)
Page 38: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Clients have a partitioned space of file names: a local name space and a shared name space

Dedicated servers, called Vice, present the shared name space to the clients as an homogeneous, identical, and location transparent file hierarchy

Workstations run the Virtue protocol to communicate with Vice.

Are required to have local disks where they store their local name space

Servers collectively are responsible for the storage and management of the shared name space

Andrew File System

Page 39: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Clients and servers are structured in clusters interconnected by a backbone LAN

A cluster consists of a collection of workstations and a cluster server and is connected to the backbone by a router

A key mechanism selected for remote file operations is whole file caching

Opening a file causes it to be cached, in its entirety, on the local disk

Andrew File System

Page 40: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

SHARED NAME SPACE:  The server file space is divided into volumes. Volumes contain files of

only one user. It's these volumes that are the level of granularity attached to a client.

A vice file can be accessed using a fid = <volume number, vnode >. The fid doesn't depend on machine location. A client queries a volume-location database for this information.

Volumes can migrate between servers to balance space and utilization. Old server has "forwarding" instructions and handles client updates during migration.

Read-only volumes ( system files, etc. ) can be replicated. The volume database knows how to find these.

Andrew File System

Page 41: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

FILE OPERATIONS AND CONSISTENCY SEMANTICS:  Andrew caches entire files form servers

A client workstation interacts with Vice servers only during opening and closing of files

Venus – caches files from Vice when they are opened, and stores modified copies of files back when they are closed

Reading and writing bytes of a file are done by the kernel without Venus intervention on the cached copy

Venus caches contents of directories and symbolic links, for path-name translation

Exceptions to the caching policy are modifications to directories that are made directly on the server responsibility for that directory

Andrew File System

Page 42: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

IMPLEMENTATION – Flow of a request:  Deflection of open/close:  The client kernel is modified to detect references to vice files.

The request is forwarded to Venus with these steps:

Venus does pathname translation.

Asks Vice for the file

Moves the file to local disk

Passes inode of file back to client kernel.

Venus maintains caches for status ( in memory ) and data ( on local disk.)

A server user-level process handles client requests.

A lightweight process handles concurrent RPC requests from clients.

State information is cached in this process.

Susceptible to reliability problems.

Andrew File System

Page 43: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Steps : 1) Open the AFS client by clicking on the padlock icon

with the red cross on the taskbar next to the clock. A window will open:

How to use window AFS Client

Page 44: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

2) Click on new token

Page 45: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

2) Give AFS cell, username and password.

Page 46: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

3) The AFS file system (AFS-CABI) is mounted as drive Z. You

can now access your AFS files through drive Z.

Page 47: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

4) When you want to log out of AFS, bring up the AFS client window again by clicking on the padlock icon on the taskbar

next to the clock.

Page 48: Chapter 17 Distributed File Systems By: Amar  Deo(300532107) Ankur  Patel(300873058)

Thank You