resource management and accounting working group working group scope and components progress made...
TRANSCRIPT
Resource Management and Accounting Working Group
• Working Group Scope and Components
• Progress made
• Current issues being worked
• Next steps
• Discussions involving larger group
Working Group Scope
The Resource Management Working Group encompasses the areas of resource management, scheduling and accounting.
This working group will focus on the following software components:
• Job Manager(/Queue Manager)
• Scheduler
• Allocation Manager (and accounting)
• Meta Scheduler
Proposed Component Architecture
Job/QueueManager
AllocationManager
Collector
MetaScheduler
Scheduler
NodeManager
ProcessManager
SecuritySystem
InformationService
DiscoveryService
Color Key
Working Group
Resource Management and Accounting
Execution Management and Monitoring
Node Config and Infrastructure
Proposed Component Architecture
Scheduler
PBS server
PBS Mom
QueueManager
ProcessManager
Collector
NodeMonitor
JobManager
Job Management
Node Management1 2 3 4
ba
Component Interaction DiagramJob submitted to Queue Manager
UserInterface
Node Manager
MetaScheduler
Job Manager
Allocation Manager
Scheduler ProcessManager
21
34
65
7
9
8
10
11
Component Interaction TraceJob submitted to Queue Manager
1. A user submits a job to the Queue Manager2. The Queue Manager does a sanity balance check with the Bank3. The Queue Manager notifies the Scheduler that a new job has arrived4. The Scheduler queries node and job status until job can run5. A bank reservation is made with the Allocation Manager6. The Scheduler requests the Queue Manager to run the job7. The Queue Manager passes job control to the Process Manager8. The Process Manager notifies Queue Manager of job completion9. The Queue Manager notifies Scheduler of job completion10. A bank withdrawal is made with the Allocation Manager11. The user is notified of job completion
General Progress
• Creation of XML marshaller/unmarshaller
• Establishment of CVS repository
• Prototype demonstration: Scheduler makes a deposit to allocation manager using XML interface
Scheduler Progress
• Creation of SSS Resource Manager interface (RMType SSS – half-open sockets)
• Creation of SSS Allocation Manager interface• Creation of allocation manager and resource
manager objects for management of arbitrary attributes
• Integration of XML marshaller/unmarshaller• Maui enhancements to link with C++ libs (Xerxes)• Additional regression tests
Meta Scheduler Progress
• Added support for data-staging interface• Added support for network proximity optimization• Initial support for checkpoint/restart
– Checkpoint aware statistics– Checkpoint aware preemption optimizations
• Sqsub client created allowing PBS-style jobs to be submitted and metascheduled
• Initial work on translation library (PBS->silver & silver->RS2)
• Stability enhancements
Job Manager Progress
• Initial job manager specification defined• Interacted with process manager working group
and drafted specification proposals for task manager and node manager and how they will interact with RMWG components
• Initial study on PBS to determine viability of dissection possibilities and functionality enhancements
Allocation Manager Progress
• Draft requirements document underway• XML schema version 0.3 reworked to have
explicit request & response elements• From scratch allocation manager being used as
prototype to test XML interface• Implemented create, query, modify and delete for
user, account and membership objects (interacting with database over JDBC)
Allocation Manager Progress (contd)
• Stubbed in dummy withdrawal and successfully demo’d XML interface with scheduler (validating against schema)
• Logging, config files, error handling
• General purpose dcecp-like client allows output formatting by utilizing metadata from queries
Current Issues
• Job Manager/Queue Manager as separate or unified components
• How to split up PBS (if at all) and at what levels (if any) to refit with XML interface
• Working with Software Engineering Working Group to decide on test framework
Next Work
• All components under CVS• Establish initial resource management interface
specifications for release• Scheduler demos by next face-to-face:
– Scheduler to process manager (over XML)– Scheduler to node manager (over XML)– Scheduler to job manager (over XML)– Drive an end-to-end checkpoint request– Scheduler talks to registry and discovery service
Next Work
• Job manager/queue manager milestones– Submission client submits job to queue manager and
queue manager reports status to user client
– Scheduler implements query to obtain job info from queue manager
– Scheduler starts a job (requires implementation of task manager interface) – also cancel job
– No prolog, epilog initially. Batch only. Simple single-step jobs. Supports polling mode only. No data-staging.
Next Work
• Allocation manager– Completion of XML schema for remaining
objects/services– Review of requirements (SDSC, NCSA …)– Complete (1st draft of) initial requirements– Implement machine class, allocations,
reservations, withdrawals, transaction register, simple charging algorithm
Issues requiring inter-group coordination
• Need to solidify SSS-wide standards for packaging, revision control, documentation, problem tracking, online project schedule… and establish mechanisms and places to home them.