lustre and nfs v4
DESCRIPTION
My presentation contrasting the lustre fs and nfs v4TRANSCRIPT
4/29/2008 1
Lustre and NFS v4.0Lustre and NFS v4.0
Chris SosaChris Sosa
For Grimshaw’s Grid SeminarFor Grimshaw’s Grid Seminar
4/29/2008 2
Lustre – MotivationLustre – Motivation
Need for a file system for large Need for a file system for large clusters that has the following clusters that has the following attributesattributes Highly scalable > 10,000 nodesHighly scalable > 10,000 nodes Provide petabytes of storageProvide petabytes of storage High throughput (100 GB/sec)High throughput (100 GB/sec)
Datacenters have different needs so Datacenters have different needs so we need a general-purpose back-end we need a general-purpose back-end file systemfile system
4/29/2008 3
Lustre = Linux + ClusterLustre = Linux + Cluster
Peter Braam created the design for Peter Braam created the design for Lustre at CMU whom went on to Lustre at CMU whom went on to found Cluster File Systemsfound Cluster File Systems
Cluster File Systems was bought by Cluster File Systems was bought by Sun in last 2007 – Lustre now part of Sun in last 2007 – Lustre now part of SunSun
Lustre is the file system with the Lustre is the file system with the largest share in HPC (see BlueGene largest share in HPC (see BlueGene (or not))(or not))
4/29/2008 4
Features of LustreFeatures of Lustre
Open-source object-based cluster file Open-source object-based cluster file system system
Fully compliant with POSIXFully compliant with POSIX Features (i.e. what I will discuss)Features (i.e. what I will discuss)
Object ProtocolsObject Protocols Intent-based LockingIntent-based Locking Adaptive Locking PoliciesAdaptive Locking Policies Aggressive CachingAggressive Caching
4/29/2008 5
System OverviewSystem Overview
4/29/2008 6
Object ProtocolsObject Protocols
4/29/2008 7
Intent-based LockingIntent-based Locking
4/29/2008 8
Adaptive Locking PoliciesAdaptive Locking Policies Policy depends on contextPolicy depends on context Mode 1: Performing Mode 1: Performing
operations on something operations on something they only mostly use they only mostly use (e.g. /home/username)(e.g. /home/username)
Mode 2: Performing Mode 2: Performing operations on a highly operations on a highly contentious Resource contentious Resource (e.g. /tmp)(e.g. /tmp)
DLM capable of granting DLM capable of granting locks on an entire subtree locks on an entire subtree and whole files and whole files
4/29/2008 9
Aggressive CachingAggressive Caching Keeps local journal of Keeps local journal of
updates for locked filesupdates for locked files One per file operationOne per file operation Hard linked files get Hard linked files get
special treatment with special treatment with subtree lockssubtree locks
Lock revoked -> Lock revoked -> updates flushed and updates flushed and replayed replayed
Use subtree change Use subtree change times to validate cache times to validate cache entriesentries
Additionally features Additionally features collaborative caching -> collaborative caching -> referrals to other referrals to other dedicated cache servicededicated cache service
4/29/2008 10
On to NFS Version 4.0On to NFS Version 4.0
4/29/2008 11
MotivationMotivation
We want a file system that provides We want a file system that provides distributed transparent access in a distributed transparent access in a heterogeneous networkheterogeneous network
NFS pre 4 had a lot of issuesNFS pre 4 had a lot of issues Caches had no guaranteesCaches had no guarantees Terrible failure semanticsTerrible failure semantics
Hanging locksHanging locks Server / Clients were never sure of anythingServer / Clients were never sure of anything
Data coherency, what’s that?Data coherency, what’s that?
4/29/2008 12
Overview of NFS v4Overview of NFS v4
Stateful ProtocolStateful Protocol Compound OperationsCompound Operations Lease-based LocksLease-based Locks ““Delegation” to clientsDelegation” to clients Close-Open Cache ConsistencyClose-Open Cache Consistency Better securityBetter security
4/29/2008 13
StatefulStateful
Borrowed model from CIFS (Common Internet Borrowed model from CIFS (Common Internet File System) see MS (Marty’s supporters)File System) see MS (Marty’s supporters)
Open/CloseOpen/Close Opens also handles creates, etc.Opens also handles creates, etc. Close semanticsClose semantics Opens do byte locking and file locking atomically Opens do byte locking and file locking atomically
on the openon the open Locks / delegation released on file closeLocks / delegation released on file close Everything done with file handlesEverything done with file handles Always a notion of a “current file handle” i.e. see Always a notion of a “current file handle” i.e. see pwdpwd
4/29/2008 14
COMPOUND OpsCOMPOUND Ops
Problem: Normal Problem: Normal filesystem semantics filesystem semantics have too many RPC’s have too many RPC’s (boo)(boo)
Solution: Group many Solution: Group many calls into one call (yay)calls into one call (yay)
SemanticsSemantics Run sequentiallyRun sequentially Fails on first failureFails on first failure Returns status of each Returns status of each
individual RPC in the individual RPC in the compound response (either compound response (either to failure or success)to failure or success)Compound
Kitty
4/29/2008 15
Lease-based LocksLease-based Locks
Both byte-range and file locksBoth byte-range and file locks Heartbeats keep locks alive (renew Heartbeats keep locks alive (renew
lock)lock) A lease on every lock that indicates A lease on every lock that indicates
that the client is still upthat the client is still up If server fails, waits at least the agreed If server fails, waits at least the agreed
upon lease time (constant) before upon lease time (constant) before accepting any other lock requestsaccepting any other lock requests
If client fails, locks are released by If client fails, locks are released by server at the end of lease periodserver at the end of lease period
4/29/2008 16
DelegationDelegation
Tells client no one else has the file (similar to Tells client no one else has the file (similar to Lustre’s first mode)Lustre’s first mode)
Client exposes callbacksClient exposes callbacks Difference here between 4.0 / 4.1 Difference here between 4.0 / 4.1 Here’s a second bulletHere’s a second bullet
4/29/2008 17
Close-Open ConsistencyClose-Open Consistency
Any opens that happen after a close Any opens that happen after a close finishes are consistent with the finishes are consistent with the information with the last closeinformation with the last close
Last close wins the competitionLast close wins the competition Not coherent (without locks)Not coherent (without locks) You have to reopen to see if you wonYou have to reopen to see if you won
4/29/2008 18
SecuritySecurity
Uses the GSS-API Uses the GSS-API frameworkframework
All id’s are formed All id’s are formed withwith User@domainUser@domain Group@domainGroup@domain
Every Every implementation must implementation must have Kerberos v5have Kerberos v5
Every Every implementation must implementation must have LIPKeyhave LIPKey
Meow
4/29/2008 19
Other StuffOther Stuff
Replication / Migration mechanism addedReplication / Migration mechanism added Special error messages to indicate migrationSpecial error messages to indicate migration Special attribute for both replication and Special attribute for both replication and
migration that gives the location of the migration that gives the location of the other / new locationother / new location
If file system response is too slow or get the If file system response is too slow or get the special error message, can check the special special error message, can check the special attribute for the read-only replica (or stop attribute for the read-only replica (or stop using security)using security)
4/29/2008 20
Comparison of NFSv3 and NFSv4Comparison of NFSv3 and NFSv4
4/29/2008 21
Questions?Questions?