sss build and configuration management update february 24, 2003 narayan desai [email protected]
TRANSCRIPT
Communication Infrastructure Overview• Service Directory
• Feature complete
• Validating
• Robust
• SSSLib• Supports 5 wire protocols
• Data encryption implemented
• Uses “trusted endpoint” model3
• 5 language bindings based on same code base
• Event Manager• Feature complete
• Validating
• Performs well
• Stable
Communication Infrastructure Stress Testing• At scale tests run last week
• ~240 nodes
• 32 processes per node
• Tests• Service directory
• Event Manager
• Senders
• Receivers
• Results• Thread safety issues revealed
• Minor race conditions fixed
• Now runs at scale for extended stress tests
Communication Infrastructure Futures• Schema updates
• Restriction based syntax (more on this later)
• Bring service directory and event manager schemas in line with other current schemas
• Parallel Implementations
• High availability support
• More wire protocol modules
Build and Configuration Management Status• Complete implementation in use on Chiba City
• Second implementation underway at Oak Ridge
• Complete schemas exist and are used for validation in all components
• System model includes 3 components• Same model shown at last face to face
• Basic validation of approach demonstrated by multiple, disparate implementations
• Transition to restriction based syntax completed
Cluster Hardware Infrastructure• Handles all pre-software install node interactions
• Power controllers
• BIOS setup
• Node identification
• Ethernet switch setup
• IPMI
• First component a node interacts with in the BCM stack
• Initial version that supports Chiba City hardware in use
• Stores and serves hardware topology information
Build System• Cluster configuration management system
• Handles software installation and system configuration
• Handles user access control
• Stores and serves node attribute information
• Stores and serves node configuration information
• OSCAR based implementation underway
• City toolkit based implementation completed and in use
Node State Manager• Administrative control panel for a cluster
• Manages system administrative state information
• Integrates with cluster diagnostic system
• Stores and serves information node states and administrative states
• Generates events on node state changes
• Provides imperative interface to diagnostic system
• Initial system diagnostics supplied by AmIHappy
Build and Configuration Futures• Schemas stable
• Develop a more modular cluster hardware infrastructure implementation with better hardware support
• OSCAR deployment of SSS components underway
• Develop better system diagnostics
• Work towards better node state manager integration
• Figure out more interesting uses of restriction based syntax • (getting curious yet?)
Restriction Based Syntax• All potentially multiple argument functions treat argument data
as a restriction, not as an explicit argument
• Restrictions match all data that meets the criteria specified
• Allows matching to be performed inside of components
• Allows operations to use matching
• Opens the door to transactions
• Makes data ownership more explicit
Example<set-node-state state=‘on’ adminstate=‘offline’>
<node-state adminstate=‘online’/>
</set-node-state>
• Operates on all nodes where adminstate=‘online’
• This allows all operations to be performed efficiently
API Augmentation• APIs only control server side functionality
• i.e., what can client count on from components?
• Our APIs currently consist entirely of XML schemas
• This may not be sufficient• Clients may wait for events
• Event generation is not specified
• Semantics for all commands aren’t specified (yet)
• Data ownership is not yet clear