errors, status, and asynchrony discussion session
DESCRIPTION
Errors, Status, and Asynchrony Discussion Session. PPDG Data Replication Meeting 10 January 2002 Douglas Thain, Condor Project University of Wisconsin. Agenda. A Working Model Two Error-Management Issues Thinking of Data-Movement as “Jobs” Reconciling Error Representations - PowerPoint PPT PresentationTRANSCRIPT
Errors, Status,Errors, Status,and Asynchronyand Asynchrony
Discussion SessionDiscussion SessionPPDG Data Replication Meeting
10 January 2002
Douglas Thain, Condor Project
University of Wisconsin
AgendaAgendaA Working ModelTwo Error-Management Issues
– Thinking of Data-Movement as “Jobs”– Reconciling Error Representations
Example ProblemDiscussionOpen Issues:
– Hints and Absolutes in Replica Management– Tradeoff between consistency and availability
Discussion PointsDiscussion Points Data Job Management and Fault-Tolerance
– What faults do we intend to tolerate/expose/ignore?– Can we develop a general transaction infrastructure for
replication-related activities?– How should we evaluate designs that may be error sensitive?
(design review, stress testing)
Error Identification and Representation
– Should we have a uniform error space?– Is it feasible to translate between existing error spaces?– What systems have unusual errors modes that outsiders may not
expect?– How do we deal with unusual errors that must pass through
existing APIs?
GRIN
Replica Site A Replica Site B
L1 P1L2L3
P2P3
L1 BL2L3
BB
A Working Model: GiggleA Working Model: Giggle
Foster, Iamnitchi, Ripeanu, Chervenak, Deelman, Kesselman, Hoschek, Kunszt, Stockinger, Stockinger, Tierney, “Giggle: A Framework for Constructing Scalable Replica Location Services”
The ProblemThe Problem
Replication systems will be subject to a wide variety of errors.
How do we build systems that maintain consistency in the face of errors?
– Answer: Use transactions to manage jobs, but...How do we build systems that make reasonable
performance decisions in the face of errors?
– Answer: Informative errors, but…
Fault Tolerance TerminologyFault Tolerance Terminology
Failure– An externally-visible deviation from
specifications.
Error– An internal data state that leads to a failure.
Fault– An external event that creates an error.
A. Avizienis and J.C. Laprie, Dependable computing: From concepts to design diversity, Proc IEEE 74, 5 (May) 629-638
ExampleExample
Client Server
What is sqrt(4)?Hmm, sqrt(4) is...
Hmm, sqrt(9) is...Answer: 3
ERRORFAILURE
FAULT
Silent errors (failures)– The system claims to have reached a valid result, but an
auditor claims it is invalid.
Explicit errors (failures)– The system tells us it cannot complete the desired action.
Escaping errors (failures)– The system detects an error, but has no method of
reporting it, so it escapes by an alternate route -- drop connection, core dump, kernel panic. (exception)
John B. Goodenough, Exception Handling: Issues and a Proposed Notation, CACM 18:22 (1975), pp 683-696.
What Errors to Expect in a What Errors to Expect in a Replication System?Replication System?
Errors of communication:– File transfer was broken between bytes.– Collection transfer was broken between files.
Errors of omission:– Requested some files, but response was slow, so the
caller gave up and left. (with/out abort?)
Errors in configuration:– Space at target server can’t admit all incoming data
at once.
Replica Catalog
Replica Site A Replica Site B
L1 P1L2L3
P2P3
L1 BL2L3
BB
What Must Be Consistent?What Must Be Consistent?
P3
P1P2
Index of files and the files themselves must be kept
consistent
Giggle does not require that a GRIN be up-to-date, but it is
useful to consider.
Data Movement as a JobData Movement as a Job
Each request issued for replication must have a past, present, and future:– Who issued it, and why?– What is it doing now?– Is it done? Did it succeed?– Enough information to roll back after a failure.
A complete program execution:– data jobs + cpu jobs + dependencies =
DAGMan/DaPMan
Job Management Job Management Primary technique for reliable interacting with the job
queue: transaction.ACID Test: Atomicity, Consistency, Isolation,
Durability.Of course, the natural interface to a db, but not all
participants are a full db.– Interface:
2PL and friends
– Implementation: Logging, shadowing, a real db?
Two-Phase CommitTwo-Phase Commit
id or failure
commit(tid)
ok
StableStorage
Work Space
Archival Space
Client Server
prepare(data)
tid
J. Eliot Moss, Nested Transactions: An Approach to Reliable Distributed Computing, MIT Press, 1985.
StableStorage
Work Space
Two-Phase CommitTwo-Phase Commit
begin()
tid
commit(tid)Archival Space
Client Server
James Frey, Todd Tannenbaum, Ian Foster, Miron Livny, and Steven Tuecke, "Condor-G: A Computation Management Agent for Multi-Institutional Grids", Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10), 2001.
add(tid,data)
ok
ok
end(tid)
ok
PR
EP
AR
EC
OM
MIT
tid
Transactions and StatusTransactions and Status
The transaction ID then becomes a persistent “job number” for later queries:– Success, failure, abort, timeout…– unknown-past, unknown-future.
For this status to be useful, a record of the job must be kept around for a certain period of time.
Also ok to time out, cancel, or otherwise remove data movement jobs.
But, a committed transaction must be kept. Can’t re-use a job number!
Transaction ImplementationsTransaction ImplementationsLogging
– Keep a log of all actions, new and old values.– Read forward to redo, backwards to undo.
Shadowing– Add changed data to unallocated space.– Atomically commit new pointers to data.
D D
D D
M
D D D
Atomic pointer update
Transaction ImplementationsTransaction ImplementationsIf a standard file system is the underlying
storage, then shadowing is a natural fit.– Most metadata updates are designed to be
atomic and synchronous.– Most large data updates are designed to provide
good xput, but are asynchronous and not guaranteed until after an explicit commit.
Atomic File UpdateAtomic File Update
fd = creat(“file.tmp”) write(fd,data,length)fsync(fd)close(fd)rename(“file.tmp”,”file”)
unlink(“file.tmp”) unlink(“*.tmp”)
(Technique used on Condor checkpoint servers and scheduler processes.)
On Success On Failure or abort
On reboot
Done.
Unifying Storage ServicesUnifying Storage Services
Virtual Operating System
POSIX
App
UNIXDriver
SRBDriver
GridFTPDriver
NeSTDriver
KangarooDriver
GASSDriver
An Alphabet Soup of Protocols, APIs, Systems, Authorities, and
Authors
Error Error Representation:Representation:
A ProblemA Problemof Depthof Depth
BypassAgent
App
ReplicaAccessLibrary
ReplicaServer
ReplicaCatalog
Replica Server
POSIX
RM
P
RAP
PPDG API
DiskCache
TapeArchive
Win32
???
RM
P
FTPServer
FTP
A Problem ofA Problem ofDesign DirectionDesign Direction
BottomUp
Design
App
ApplicationLibrary
StandardLibrary
OSKernel
POSIX
ANSI
???
App
Virtual OS
ReplicaAccess
ReplicaServer
SRB
PPDG API
POSIX
Outside In
Design
The End-to-End ArgumentThe End-to-End Argument
In complex software, the outermost layer has the ultimate responsibility for interpreting and recovering from errors.
Recovery in a lower layer is an optimization of performance or convenience.
If the possibility of error is very high, lower-level recovery is needed for good performance.
Saltzer, Reed, and Clark, End-to-End Arguments in System Design, Computer Systems 2:4, pp 277-288, 1984.
UNIX ErrnosUNIX Errnos
A single namespace of integer errors that apply to all levels of the system.
Any call is free to return any possible error. (124)
General vs specific:– ENOENT vs ECHILD
Some artifacts:– EACCESS vs EPERM– EADV and EDOTDOT
EPERM 1 /* Operation not permitted */ENOENT 2 /* No such file or directory */ESRCH 3 /* No such process */EINTR 4 /* Interrupted system call */EIO 5 /* I/O error */ENXIO 6 /* No such device or address */E2BIG 7 /* Arg list too long */ENOEXEC 8 /* Exec format error */EBADF 9 /* Bad file number */ECHILD 10 /* No child processes */EAGAIN 11 /* Try again */ENOMEM 12 /* Out of memory */EACCES 13 /* Permission denied */..
FTP Reply CodesFTP Reply Codes
Integer codes indicate the severity of a response to an action.
Many transfer problems are identified, but few file system problems are.
Third digit specified infrequently, and for wide classes of errors.
100 - Positive Preliminary
200 - Positive Completion
300 - Positive Intermediate
400 - Transient Negative
500 - Permanent negative
000 - Syntax
010 - Information
020 - Connections
030 - Authentication
040 - Unspecified
050 - File System
550: “e.g. File not found, no access”
Error space is an amalgam of all back end error spaces.
Pros: No information is ever lost in translation.
Cons: Very difficult to write code that switches on the error number (1026 cases.)
UNIX_EPERM -1301UNIX_ENOENT -1302. . .UNIX_EDEADLOCK -1356
HPSS_EPERM -1401HPSS_ENOENT -1402. . .HPSS_NOCOS -1499
MCAT_OPEN_ERROR -3001MCAT_CONNECT_ERROR -3002. . .MCAT_USER_NOT_IN_DOMN -3032
SQL_RSLT_TOO_LONG -1600
HTTP_ERR_BAD_PATH -1700
SRB Reply CodesSRB Reply Codes
Pros:– Errors may be
identified at varying levels of granularity.
– Easily expandable.– Lots of debug info.
Cons:– Can be difficult to
decide in which class to place an external error.
– In practice, most errors are returned as objects of type “string”.
Error
Authen-tication
Author-ization
Commun-ication
NoCreds
ExpiredCreds
NoTrust
Globus Error ObjectsGlobus Error Objects
String
UNIX_EPERM -1301UNIX_ENOENT -1302. . .UNIX_EDEADLOCK -1356
HPSS_EPERM -1401HPSS_ENOENT -1402. . .HPSS_NOCOS -1499
MCAT_OPEN_ERROR -3001MCAT_CONNECT_ERROR -3002. . .MCAT_USER_NOT_IN_DOMN -3032
SQL_RSLT_TOO_LONG -1600
HTTP_ERR_BAD_PATH -1700
Translation Can be Done…Translation Can be Done…to a Pointto a Point
EPERM
ENOENT
ESRCH
EINTR
EIO
EACCESS
EISDIR
OTHER
Grope in the DarkGrope in the Dark
if GET succeedsreturn success
elseif CHDIR succeeds
return EISDIR
elseif LIST succeeds
return EACCESS
elsereturn ENOENT
end
end
end
GET
CHDIR
LIST
EACCESS
Error Identification isError Identification isa a PerformancePerformance Concern Concern
We can always find some way to produce an execution that avoids a silent failure.– Pass all errors up one level.– Retry all errors until time expires.– Abort process completely.
But, a known, finite, space allows the caller to make targeted decisions about what to do next:– “Not Authorized” -- best to pass up one level.– “Operation Interrupted” -- best to retry here.
Give the Essence orGive the Essence orGive the Details?Give the Details?
Example in file systems:– “Fell off the end of the directory linked list.”– or “No file by that name.”
Example in networking:– “Timer went off, but no network interrupt received.’– or “Connection lost.”
Example in security:– “Failure in PEM_do_header while reading password.”– or “You have no credentials.”
Example in Storage:– HPSS_NOCOS– or ?????
Example and DiscussionExample and Discussion
ExampleExample
Goal:– User requests a repl of a file from B to A.
Data Structures at each Node:– A persistent map map from LFNs to PFNs.– A persistent store for transactions.– A persistent store for data.
Assumptions:– Files are read-only, no need for invalidation.– All nodes must survive reboot cleanly.– File transfers may be resumed from any point.
Replica Site A Replica Site B
L1 P1L2L3
P2P3
Client
I want LFN 2
Get LFN 2
Got it.
Replica Catalog
L1 BL2L3
BB
Where is LFN 2?
At site B.
Replica Site A
LFN TRN
T53.tmp
LFN = L2PFN = P16
State = Working
T53
LFN = L2PFN = P16
State = Working
L2 T53
T53
LFN = L2PFN = P16
State = Done
T53
commit(T53)
ok
T53.tmp
LFN = L2PFN = P16
State = Working
Client
prepare(get L2)
Server
T53.tmp
LFN = L2PFN = P16
State = Done
P16
PhysicalData File
More IssuesMore IssuesCleanup at Reboot:
– Remove uncommitted transactions.– Jobs in progress: Update LFN->TRN entry.
Client Status Check:– Requesting client examines state of transaction.– Or, other clients indirect through LFN entry.
Notification of Status Change:– Unreliable -- Server sends messages to client.– Reliable --Server must do transaction to client.
(See Condor-G Paper)