application redundancy tool a.r.t. cs 495 fall 2005 kristi olson
TRANSCRIPT
Application Redundancy Tool
A.R.T.
CS 495 Fall 2005
Kristi Olson
Description
A.R.T. is an system of hot standby for applications.
Designed for applications which need to run continuously.
Intended for use with applications which require socket connections.
Internship
Internship with GCI’s Network Support Group, Operations System Support.
NSG OSS responsibilities: Provision phone service, calling cards,
internet services. Internal Support (data collection) Network Monitoring
Applications supporting these services
Homegrown. Most run around the clock. Some applications are mission critical. Loss of productivity when applications
are down. No formal system of redundancy.
Lack of existing product to buy. Lack of funds.
Applications continued
Methods of transmission vary widely. Fiber optic
Reliable, fast Satellite
Prone to weather related outages, inherent 600 ms delay.
Microwave Even more prone to weather: fog, rain, or
even a hot day causes outages.
Applications continued
Socket connections: Some applications establish one or more
“permanent” socket connections. Others repeatedly establish multiple
“temporary” socket connections.
How to provide redundancy?
V.R.R.P. Virtual Routing Redundancy Protocol
System of dynamic redundancy. One router is designated master. Other routers are backups. Uses multicasting.
V.R.R.P. and A.R.T.
Establish “Master” and “backup” instances of the application.
Identical except for a configuration file. The backups loop continuously listening for
status polls from the Master. If the Master stops sending polls, the
backup comes online. Put the application instances on the same
multicast group.
Requirements
Minimal modifications to existing code. Configurable for any type of application and
transmission protocol. Reliability:
A backup can not come online prematurely, nor can it come online too late.
The actual switch over should be minimally service affecting.
Applications should take control or release sockets accordingly.
Configuration
Priority: The instance with the highest priority becomes
the master. Broadcast Interval:
How often the application sends and listens for status polls.
Allowable Missed Polls: Multicasting uses UDP. Less reliable transmission technologies.
Status Polls
Application Name Each application only reads messages
pertaining to itself. Timestamp:
Latency is possible. A.R.T. ignores old messages.
Priority
Putting it all together
Configuration file: Contains instance information.
A.R.T. Perl module: Defines multicast groups. Evaluates status polls. Other related overhead
Modifications to existing code: Where to poll.
Don’t want to poll too often, or not often enough. Encapsulate socket connections.
Re-factoring opportunity
High level data flow
A.R.T. Monitoring
Email Alerts: Notify admins should a switch over occur.
Web Page: Traffic Light:
Green Light - Master is working. Yellow Light – Backup is listening. Red Light - application is down.
A.R.T. Monitoring Web Page
A.R.T.
Questions?