afs made by andrew carnegie & andrew mellon carnegie mellon university presented by christopher...

AFSMade ByAndrew Carnegie & Andrew MellonCarnegie Mellon University

Presented By Christopher Tran & Binh Nguyen

AFS: ANDREW FILE SYSTEM

▪ Abstraction of DFS from users

▪ Accessing a file is similar to using a local file

▪ Scalability with region distribution

▪ Permissions Control with Access Control Lists

▪ University Environment (Large number of users)

▪ Weak Consistency by Design

AFS: PRIMARY FEATURES

▪ Implemented in UNIX at the system call level

▪ Work Unit is the entire file

▪ Applications and users are unaware of distributed system

▪ Kerberos Authentication for use over insecure networks

▪ Access Control Lists (ACLs) control permissions

▪ File Consistency through stateful servers

AFS: IMPLEMENTATION OVERVIEW

1. Application Opens file stored on AFS server

2. System Call is intercepted by a hook in the workstation Kernel (Venus)

3. Andrew Cache Manager Checks for local copy of file

4. Andrew Cache Manager Checks for callback status

5. If needed, Andrew Cache Manager forwards request to file server

6. If needed, Andrew Cache Manager receives file and stores on local machine

7. File Descriptor is returned to application

AFS: SYSTEM DIAGRAM

AFS: CALL INTERCEPT

AFS: VERSION 1

▪ Clients would constantly check with the server for consistency▪ Message intervals

▪ Every message would include authentication information▪ Server has to authenticate source

▪ Messages included full path to the file▪ Server had to traverse directories

▪ Approximately 20 clients per server (in 1988)

Check file?

It’s Good!

AFS: VERSION 1 PROBLEMS

▪ Servers spending too much time communicating with clients

▪ Clients constantly checking if a file is consistent increasing network traffic

▪ Server constantly authenticating messages using CPU time

▪ Server traversing directories every read, write, and file check using CPU time

AFS: VERSION 2

▪ Callback Mechanism – Server promises to inform clients of file change▪ Stateful Server

▪ 50 Clients per server (in 1988)

▪ Clients request file based on FID

▪ Volumes can exist on any server

I’ll let you know if

something changes

AFS: CALLBACK

▪ Server Keeps track of clients using threads▪ Each Client is managed by a separate thread

▪ Client and Server use RPC that to respective daemons▪ Server has Vice daemon

▪ Client has Venus daemon

▪ Each file a client opens also gets a AFSCallback Structure.▪ AFSCallback contains an expiration for how long the callback is valid and how the

server will communicate with the client

▪ Clients assume that file is consistent until server callback is received or the expiration time lapses.

AFS: CALLBACK INVALIDATION

AFS Server

VICE Daemon

Client 1 Client 2

Client 3 Client 4

Thread FID

Client1 412

Client2 412

Client3 412

Client4 492

FID: 412 FID: 412

FID: 492FID: 412

1. Store(412)

2. Write(412)

3. invalidate(412)

AFS: CALLBACK ISSUES

▪ No description of why the callback was initiated▪ Modified portions

▪ Appended data

▪ Saved but no data changed

▪ File Moved

▪ Etc

▪ Client has to redownload entire file when reading▪ No support for differential update

▪ If application reads more data, file is re-downloaded but updates may not be reflected in application▪ If user is reading past the changes in a file, the application is unaware of such

changes.

AFS: VOLUMES

▪ Collection of files

▪ Does not follow directory path

▪ Mounted to a directory

▪ Venus on client maps the pathname to a FID

▪ Vice on server gets file based on FID▪ Less directory traversal

AFS: SERVER SCALABILITY

▪ Server Replication: Multiple Servers act as a single logical server

▪ Server keeps track of clients in System Memory using threads

▪ Clients have a heartbeat to the server to make sure server is alive

▪ Volumes can be located on any server and moved to any other server

▪ Volume Read-Only clones used to distribute across physical space

▪ All servers share the same common name space▪ /afs/……..

▪ Local server name space can be unique where volumes are mounted▪ /afs/server2

▪ /afs/home/server3

▪ AFS servers have links to other AFS servers for Volume Locations▪ Servers know which server has a volume with specific files

AFS: FAULT HANDLING

▪ Client Crash – Worst Case Scenario▪ Upon boot to OS: check local cache against server for consistency

▪ Server Crash – Start Fresh▪ Clients detect server crashed from missed heartbeats

▪ Upon connection: clients re-establish communication

▪ Server rebuilds client list

▪ Clients check file consistency

I crashed or server crashed,

I’m probably wrong

Uptime 0 secondsLet’s GO!

AFS: WEAK CONSISTENCY

▪ Condition▪ Two or more clients have file open

▪ Two or more clients modify file

▪ Two or more clients close file to be written

▪ Result▪ Client that sends store() and received by server LAST is the current file

I got here

FIRST!

I got here

LAST, I WIN!

AFS: WHY WEAK CONSISTENCY

▪ Majority of all DFS access is reading files

▪ In a University, Users are rarely modifying files simultaneously.▪ Users work out of home directories

▪ Simplicity in Implementation▪ Allows multiplatform implementation

▪ Does not add complexity to crash recovery▪ No need to resume from a crash point

AFS: ACCESS CONTROL LISTS

▪ Standard Unix/Linux permissions are based on Owner/Group/Other

▪ ACLs allow refined control per user/group

Example, you want to share a directory with only one other person so they can read files.

Linux/Unix: make group, give group read access, add user to group

ACLs: Add user/group with read permissions

Months later: you want to give someone read/write access

Linux/Unix: can’t do it without giving “other” group read access and everyone now has read access

ACLs: Add user/group with read/write permissions

AFS: ADVANTAGES

▪ First Read performance is similar to other DFS

▪ Second Read performance is improved in almost all cases since read requests are far greater than write requests

▪ Creating new files is similar in performance with other DFS

▪ Use of ACLs over default file system permissions

▪ For read-heavy scenarios, supports a larger client-server ratio

▪ Volumes can be migrated to other AFS servers without interruption

▪ Kerberos Authentication allows access over insecure networks

▪ Build into Kernel so user login is authentication and UNIX/Linux applications can use AFS without modifications

AFS: DISADVANTAGES

▪ Entire file must be downloaded before file can be used▪ Causes a noticeable latency when accessing files the first time

▪ Modifications require entire file to be uploaded to server

▪ Short reads in large files is much slower than other DFS

▪ No simultaneous editing of files

AFS: CONTRIBUTIONS

▪ AFS highly influences NFS v4

▪ Basis of the Open Software Foundations Distributed Computing Environment▪ Framework for Distributed Computing in the Early 1990s

Current Implementations

▪ Open AFS

▪ Aria

▪ Transarc (IBM)

▪ Linux Kernel v2.6.10

AFS: SUGGESTED IDEAS

▪ Automatic Download of file when server sends consistency invalidation

▪ Smart invalidation by determining if a user needs to redownload▪ If a user is beyond the changes of a file, no need to redownload entire file.

▪ Supporting differential updates▪ Only sending information on what changed

AFS PERFORMANCE

▪ Andrew Benchmark (Still sometimes used today)▪ Simulation of typical user

▪ Multi Stage Benchmark▪ File Access▪ File Write▪ Compiling Program

▪ Response Time in creating various sized files in and out of AFS servers▪ How long until file is available to be used?

▪ AFS performance was around half in comparison to a file stored locally on a hard drive

AFS PERFORMANCE

File Count File Size File System /tmpSeconds

/tmpAverage/File

AFSSeconds

AFSAverage/File

100 8192 /tmp 0 0.00 2 0.02

1000 8192 /tmp 2 0.00 13 0.01

10,000 8192 /tmp 21 0.00 154 0.02

100,000 8192 /tmp 212 0.00 > 20 minutes n/a

Varying the count of small files

AFS PERFORMANCE

Varying the size of one file

File Count File Size File System /tmpSeconds

/tmpAverage/File

AFSSeconds

AFSAverage/File

5 102,400 /tmp 0 0.00 1 0.205 512,000 /tmp 0 0.00 3 0.605 1,024,000 /tmp 1 0.20 6 1.205 2,048,000 /tmp 1 0.20 13 2.605 3,072,000 /tmp 1 0.20 19 3.805 4,096,000 /tmp 1 0.20 26 5.205 5,120,000 /tmp 3 0.60 32 6.405 10,240,000 /tmp 3 0.60 64 12.805 20,480,000 /tmp 6 1.20 126 25.205 40,960,000 /tmp 13 2.60 270 54.00

AFS PERFORMANCE

▪ Largest impact is when making lots and lots of small files or very large files

▪ The extra overhead is directly proportional to the total number of bytes in the file(s)

▪ Each individual file has its own additional overhead, but until the number of files get very large, it is not easy to detect

AFS PERFORMANCE

▪ AFS: server-initiated invalidation

▪ NFS: client-initiated invalidation

▪ Server-initiated invalidation performs better than client-initiated invalidation

AFS PERFORMANCE

　 Andrew NFS

Total Packets 3,824 10,225

Packets from Server to Client 2,003 6,490

Packets from Client to Server 1,818 3,735

Network Traffic Comparison

AFS PERFORMANCE

AFS NFSCallback Mechanism (Server initiated) Client-initiated Invalidation

Network traffic reduced by callbacks, large buffers

Network traffic increased by limited caching

Stateful servers Stateless servers

Excellent performance in wide-area configurations

Inefficient in wide-area configurations

Scaleable; maintains performance in any size installation

Best in small- to medium-size installations

AFS: QUESTIONS

BIBLIOGRAPHY

"AFS and Performance." University of Michigan. Web. Accessed 16 May 2014. <http://csg.sph.umich.edu/docs/unix/afs/>

"Andrew File System." Wikipedia. Wikimedia Foundation, 05 July 2014. Web. 16 May 2014.

<http://en.wikipedia.org/wiki/Andrew_File_System>

"The Andrew File System." University of Wisconsin. Web. Accessed 16 May 2014. <http://pages.cs.wisc.edu/~remzi/OSTEP/dist-

afs.pdf>

Coulouris, George F. Distributed Systems: Concepts and Design. 5th ed. Boston: Addison-Wesley, 2012. Print.

John H Howard, "An Overview of the Andrew File System", in Winter 1988 USENIX Conference Proceedings, 1988

M. L. Kazar, "Synchronization and Caching Issues in the Andrew File System", In Proceedings of the USENIX Winter Technical Conference, 1988

afs made by andrew carnegie & andrew mellon carnegie mellon university presented by christopher...

Documents

mathematical method and proof - andrew cmu - carnegie mellon

andrew gilpin and tuomas sandholm carnegie mellon...

yudi chen, carnegie mellon university catherine groschner,...

carnegie mellon,

carnegie mellon carnegie mellon web portal v0.9 featuring...

what’s strange about recent events (wsare) v3.0: adjusting...

online deduplication for databases -...

making sense by making sentient - andrew cmu - carnegie...

the whyline - carnegie mellon school of computer...

a, b - andrew cmu - carnegie mellon university

carnegie mellon electricity industry center carnegie...

carnegie mellon universitycarnegie mellon university

purnamrita sarkar (carnegie mellon) deepayan chakrabarti...

ryan ’donnell carnegie mellon university o. ryan...

whats strange about recent events (wsare) weng-keen wong...

paul luo li (carnegie mellon university) james herbsleb...

carnegie mellon universitygenovese/talks/ipam-04.pdf ·...

carnegie mellon andrew system e-mail architecture at...

carnegie mellon

carnegie melloncarnegie melloncarnegie mellon...