mobile file system byung chul tak. afs andrew file system distributed computing environment...

34
Mobile File System <AFS, Coda, Bayou> Byung Chul Tak

Upload: ross-neal

Post on 05-Jan-2016

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

Mobile File System<AFS, Coda, Bayou>

Byung Chul Tak

Page 2: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS Andrew File System

• Distributed computing environment developed at CMU

• provides transparent access to remote shared files

• The most important design goal : Scalability

• allows existing UNIX programs to access AFS files without modification or recompilations

Page 3: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS Two design characteristics

• Whole-file serving◦The entire contents of directories and files

are transmitted to client computers by AFS servers

• While-file caching◦A copy of a file is stored in a cache on the

local disk◦The file cache is permanent

Page 4: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS Usage scenario

• A client issues open system call for a file◦ If there is no copy in the local cache

∙ the server is located∙ a request for a copy of the file is made

• The copy is stored in the local UNIX file system and opened

• Subsequent read, write are applied to the local copy

• When the client issues a close system call◦ if the local copy is updated, its contents are sent back

to the server

Page 5: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS Assumptions

• Most files are small• Read is much more common than writes• Sequential access is common, and

random access is rare• Most files are read and written by only

one user◦When a file is shares, it is usually only one

user who modifies it

• Files are referenced in bursts and there is a high temporal locality

Page 6: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS Distribution of processes in AFS

UNIX kernel

Userprogram

Venus

UNIX kernel

Userprogram

Venus

UNIX kernel

Userprogram

Venus

Network

Workstations

Vice

UNIX kernel

Vice

UNIX kernel

Servers

Page 7: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS Two software components

• Venus◦A user-level process that runs in each client

computers

• Vice◦The server software that runs as a user-level

UNIX process in each server computers

Page 8: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS System call interception in AFS

• BSD UNIX is modified to intercept file system calls• Venus manages cache

◦ A partition on the local disk is used as a cache

Userprogram Venus

UNIX file system

Localdisk

UNIX filesystem calls Non-local file

operations

Workstations

UNIX kernel

Page 9: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS File identifier

• Files and directories in the shared file space is identified by 96-bit fid◦Venus translates file pathnames into fids

◦ volume number∙ In AFS, files are grouped into volumes

◦ file handle∙ identify the file within the volume

◦uniquifier∙ ensures that file identifiers are not reused

Volume number File handle Uniquifier32 bits 32 bits 32 bits

Page 10: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

AFS Cache consistency

• based on the callback promise Callback promise

◦ for ensuring that cached copies of files are updated when another client closes the same file after updating it

• Vice supplies a copy of file to Venus, with a callback promise◦ a token issued by Vice with two state: valid, cancelled

• When Venus receives a callback, it sets the callback promise token to cancelled

• Venus checks the callback promise when user issues an open◦ if it is cancelled, then a fresh copy must be fetched

Page 11: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA Evolution from AFS Mechanisms for high availability

• Disconnected operation◦a mode of operation in which a client

continues to use data during network failure◦while disconnected, rely on the local cache◦cache miss is reported as failure

• Server replication◦allowing volumes to have read-write replicas

at more than one server

Page 12: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA Venus states

• Hoarding state◦ to hoard useful data in

anticipation of disconnection

• Emulation state◦ enter upon disconnection◦ Venus assumes full

responsibility of file operations

• Reintegration state◦ Venus propagates changes

made during emulation to the server

◦ validate all cached objects before use

Page 13: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA Design philosophies for extending CODA

• Don’t punish strongly-connected clients◦ unacceptable to degrade the performance of

strongly-connected clients on account of the weakly-connected ones

• Don’t make like worse than when disconnected◦ user will not tolerate substantial performance

degradation

• Do it in the background if you can◦ ex) switch foreground network delay to background

• When in doubt, seek user advice◦ As connectivity weakens, the price of misjudgment

increases

Page 14: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA CODA extensions

• Transport protocol refinements◦code separation of RPC2 and SFTP protocols

• Rapid cache validation◦ raising the granularity of cache validation

• Trickle reintegration◦propagating updates to servers

asynchronously

• User-assisted miss handling◦asking user input for large file fetch

Page 15: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA Rapid cache validation

• Under previous implementation◦Reintegration process shows low

performance∙ Validation of cached objects after reconnection

• Solution adopted◦Tracking server state at multiple levels of

granularity◦Version stamps for each volumes

∙ if version stamp is invalid, each cached object is validated as usual

Page 16: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA Trickle Reintegration

• State modification◦Write disconnected state

∙ Updates are logged and propagated via trickle reintegration

• Reintegration is run on background

• A user can force a full reintegration

Hoarding

EmulatingWrite

Disconnected

connection

disconnection

strongconnection

weakconnection

disconnection

Page 17: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA• Log optimization

◦ key to reducing the volume of reintegration data◦basic concept

∙ In emulation state, Venus logs updates∙ When a log record is appended to the CML(Client Modify

Log), Venus checks if it cancels or overrides earlier records

◦ Trickle reintegration reduces the opportunity of optimization

∙ Records should spend enough time in the CML for optimizations to be effective

Page 18: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA• Log optimization

◦Aging∙ A record is not eligible for reintegration until it

has spent a minimal amount of time in the CML▫ aging window

ReintegrationBarrier

LogHead

LogTail

Older than A

Time

[ CML during reintegration ]

Page 19: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA Seeking User Advice

• Transparency is not always acceptable◦Under low bandwidth, a file fetch could take

very long and this could be annoying to the user

◦ In some cases, a users is willing to wait for a long delay when the file is important

• Patience threshold◦Maximum time a user is willing to wait for a

particular file, or the equivalent file size◦a function of hoard priority P, bandwidth

∙ hoard priority: user perceived importance of files specified by the user

Page 20: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

CODA Seeking User Advice (cont’d)

• Patience Threshold model

• Handling misses◦ In status walk, Venus obtains status for

missing objects and decides whether to fetch◦ In data walk, Venus fetches the contents from

the server∙ If file size is above the patience threshold, a screen

is shown to the user to collect user decision

Pe τ: thresholdβ,δ: scaling parameterα: lower boundP: hoard priority

Page 21: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Bayou

• A replicated, weakly consistent storage system for mobile computing environment

Design Philosophy• Application must know they may read

inconsistent data• Applications must know there may be conflicts• Clients can read and write to any replica without

the need for coordination• The definition of conflict depends on the

semantics

Page 22: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU System model

• Each data collection is replicated in full at a number of servers

• Bayou provides two basic operations◦ read and write

• Client can use any server’s data◦ client can read and submit write◦ once write is accepted, client has no further

responsibilities◦ client does not wait for the write to propagate

• Anti-entropy session◦ Bayou servers propagate writes during pair-wise

contact

Page 23: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Conflict Detection and Resolution

• Supporting application-specific, per-write conflict detection and resolution

Two mechanisms◦permit clients to indicate how to detect

conflict and how to resolve

• dependency check• merge procedures

Page 24: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Dependency checks

• Each write operation includes a dependency check

• A SQL-like query is used• A conflict is detected if the expected

value is not returned

Page 25: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Merge procedures

• Each write operation includes a merge procedure◦written in a high-level, interpreted language

• When automatic merge is impossible, it runs to completion and produce a log◦Later, user can resolve it manually

Page 26: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU• Bayou write implementation

• Bayou write call exampleupdate

dependency check

merge procedure

Page 27: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Replica Consistency

• Eventual consistency◦Bayou guarantees that all servers eventually

receive all writes

• Consistency is maintained via pair-wise anti-entropy process

Page 28: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Anti-entropy process

• To bring two replicas up-to-date• Accept-stamp

◦ Monotonically increasing number assigned by the server when it receives a write

◦ total order over all writes accepted by the server◦ partial order over all writes in the system

• Basic design◦ a one-way operation between pairs of server◦ via the propagation of write operations◦ write propagation is constrained by the accept-order

Page 29: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Pair-wise anti-entropy

• unidirectional process• one server brings the other up-to-date by propagating

writes unknown to it

Prefix property• A server R that holds a write stamped Wi that was initi

ally accepted by another server X will also hold all writes accepted by X prior to Wi

• Accept-stamp is used to achieve this property in Bayou

Page 30: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Basic anti-entropy algorithm

• The sending server gets version vector from the receiving server

• It traverses the write-log and send writes not covered by the version vector

anti-entropy(S,R) { Get R.V from receiving server R # now send all the writes unknown to R w = first write in S.write-log WHILE (w) DO IF R.V(w.server-id) < w.accept-stamp THEN # w is new for R SendWrite(R, w) w = next write in S.write-log END}

x y zversion vector of R :

s1 s2 s3 s4 s5 s6

Page 31: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Anti-entropy process

• A receiving server may receive a write that precedes some writes on the server◦Server must undo the effect and redo with

new writes

• Each server maintains a log of all write operations it has received

• The write log may become excessively long◦ log truncation is necessary especially for

mobile systems

Page 32: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Write-log management

• Log truncation◦When two servers engage in the anti-entropy, it ma

y be possible that one server has discarded some writes that the other might need

◦ In such cases, full database transfer is required

• Write stability◦Committed write is introduced to allow log manage

ment∙ committed write : one whose position in the write-log will

not change, and never be reexecuted

Page 33: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Write stability

• Primary-commit protocol◦One replica server is designated as the primary repl

ica◦ The primary replica commits the position of a write

in the log◦CSN(Commit Sequence Number)

∙ monotonically increasing number assigned to commited writes

◦CSN is propagated back to all other servers during the anti-entropy process

Page 34: Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared

BAYOU Anti-entropy protocol extensions

• Server reconciliation using transportable media

• Support for session guarantees and eventual consistency

• Light-weight server creation and retirement