achieving self-healing in service delivery software systems by means of case-based reasoning
DESCRIPTION
Achieving self-healing in service delivery software systems by means of case-based reasoning. Stefania Montani Cosimo Anglano Presented by Tony Schneider Pr. Introduction. Background CBR Implementation Experiment / Cavy Results. - PowerPoint PPT PresentationTRANSCRIPT
Achieving self-healing in service delivery software systems by means
of case-based reasoning
Stefania Montani Cosimo Anglano
Presented by Tony Schneider
Pr
Introduction
• Background
• CBR Implementation
• Experiment / Cavy
• Results
Autonomic Systems OverviewBackground | CBR Implementation | Experiment / Cavy | Results
• Goal is to self-manage system
• System needs to exhibit
‣ Self-Configuration
‣ Self-Optimization
‣ Self-Protection
‣Self-Healing
Self-HealingBackground | CBR Implementation | Experiment / Cavy | Results
• “Service Delivery Systems” (SDS)
‣ Aimed at delivering 24/7 services
• These services prone to breakage
‣ Service failures
‣ Software, Hardware, Network
‣ Can’t handle manually
‣ Need to repair the system autonomously
Self-HealingBackground | CBR Implementation | Experiment / Cavy | Results
Self-HealingBackground | CBR Implementation | Experiment / Cavy | Results
• Internalization
‣ The Self-Healing Engine is integrated with the software
‣ Not extendable
‣ Depends on specific applications
• Externalization
‣ Great for retrofitting current systems
‣ Allows a general method for SDS self-healing
Self-HealingBackground | CBR Implementation | Experiment / Cavy | Results
• Problems with current approach
‣ MAPE model assumes prior knowledge of the system
‣ Knowledge base is problematic
‣ Large, time consuming , & laborious
‣ Need to keep up-to-date
• Build the knowledge base automatically
‣ How?
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results
• Case-Based Reasoning (CBR)
‣ Uses previous experience for problem solving
‣ Retrieves similar cases to current problem
‣ Reuses past successful solutions
‣ Revises retrieved solution if necessary
‣ Retains current case
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results
• Case-base represents “knowledge” in the MAPE model
‣ Each case represents a previous problem and its solution
‣ Implicit versus Explicit knowledge
‣ Explicit: Rules & models
‣ Implicit: Unstructured & based on experience
‣ Implicit tends to be easier and more conducive to limited interaction
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results
• Cases are stored by identifying application features
‣ The problem
‣ Applied solution
‣ The outcome of the solution
• Prevents bottleneck present in other learning methods
‣ E.g., online reinforcement learning
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results
• CBR relies on large amounts of past cases
• Pros:
‣ Methods approve with time and experience
‣ Large systems are hosts to recurrent problems
• Cons
‣ Need to store the data
‣ Need to populate the knowledge base
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results
To reiterate: CBR is a methodology designed to assist in the repair of failed systems
Questions so far?
System OverviewBackground | CBR Implementation | Experiment / Cavy | Results
• SDS is treated as a black box
‣ Self-healing CBR is entirely external to the SDS
‣ Controls the health of the SDS
‣ Components of CBR reflected in MAPE
‣ Analysis <-> Retrieval
‣ Planning <-> Revise
‣ Knowledge <-> Case base
System Overview: MAPE RevisedBackground | CBR Implementation | Experiment / Cavy | Results
Old Model Revised for CBR
System Overview: MAPE RevisedBackground | CBR Implementation | Experiment / Cavy | Results
• Four Additions
‣ Monitoring
‣ Case Preparation
‣ Service Restoration
‣ Repair Module
System Overview: MAPE RevisedBackground | CBR Implementation | Experiment / Cavy | Results
• Application Agnostic Portion
‣ Doesn’t rely on specific environment variables
• Application Specific Portion
‣ Relies on the data from the application
• Both
‣ Interface between the two layers
• The managed element is completely external to the healing system
System OverviewBackground | CBR Implementation | Experiment / Cavy | Results
• Assumptions
‣ Bad solutions have no effect on the SDS state. Likewise, good solutions don’t produce faults.
‣ Deadlines for producing case solutions aren’t fixed
‣ Every stored case has a unique solution
‣ No transient faults (occur only once)
‣ No intermittent faults (appear, disappear, then reappear again)
CBR Cycle: Retrieve - Reuse/Revise - RetainBackground | CBR Implementation | Experiment / Cavy | Results
• Every stored case is representative of some past failure
• Need to find the case that approximates current failure
• Find the average distance between features
• df(x, y)
‣ 1 if x or y are missing
‣ overlap(x, y) if f is a symbolic feature
‣ if f is a linear feature
CBR Cycle: Retrieve - Reuse/Revise - RetainBackground | CBR Implementation | Experiment / Cavy | Results
• Apply retrieved case solutions in the order of the bset average
‣ Repeat for all found cases until the problem is solved
‣ Also covers cases with multiple solutions (just use best choice)
• What if no solution works?
‣ Ask a human
CBR Cycle: Retrieve - Reuse/Revise - RetainBackground | CBR Implementation | Experiment / Cavy | Results
• Just saves the case to the knowledge base
‣ The problem
‣ The solution
‣ The outcome
Odds and EndsBackground | CBR Implementation | Experiment / Cavy | Results
• System initialization
‣ Boot strap phase
• Prototyping
‣ Makes a general case out of several similar cases in case base
‣ Solves storage space problem
‣ Takes the implicit knowledge and creates explicit knowledge
‣ Used after base case has grown
CBR questions?Background | CBR Implementation | Experiment / Cavy | Results
That wraps up the CBR portion.
Any Questions?
Experimental SetupBackground | CBR Implementation | Experiment / Cavy | Results
• Implemented CBR-based system using Java
‣ MySQL for the base case storage
• Used with an SDS testbed “Cavy”
• Cavy
‣ Configures, deploys, and operates SDS testbeds
‣ Framework that surrounds the healing engine
‣ Injects faults into test bed components
Cavy ComponentsBackground | CBR Implementation | Experiment / Cavy | Results
• Fault managers
• Diagnoser
• Service Monitor
• Integrator
• Repairer
• Injector
Cavy ComponentsBackground | CBR Implementation | Experiment / Cavy | Results
• Basically...
‣ The injector breaks the system
‣ The service monitor sees the fault
‣ The diagnoser finds a similar FS pair
‣ Interrogator receives the solution
‣ Repairer tries each solution until one works
Cavy ComponentsBackground | CBR Implementation | Experiment / Cavy | Results
• Cavy implements pieces of the self-healing architecture
‣ Interrogator: Application agnostic pieces
‣ Fault repairer: Application specific pieces
‣ Service monitor: Monitor
‣ Fault managers: Repair
The ExperimentBackground | CBR Implementation | Experiment / Cavy | Results
• Rubis
‣ Mimics eBay
‣ Two tiers
‣ Customers interact with web server on the first
‣ Database stored on the second
‣ Several services are tested
‣ Register, Browse, Sell, Home
The ExperimentBackground | CBR Implementation | Experiment / Cavy | Results
• Potential Rubis Failures (each can apply to either tier)
‣ Network Problems
‣ Configuration problems
‣ System restart
• 10 failure descriptors
‣ Boolean values
‣ Represent failed pieces of the system
Initial Base Case (constructed by a human)Background | CBR Implementation | Experiment / Cavy | Results
Automatically generated case
Initial Base Case (constructed by a human)Background | CBR Implementation | Experiment / Cavy | Results
Distances between current failure and base case
Second CaseBackground | CBR Implementation | Experiment / Cavy | Results
ResultsBackground | CBR Implementation | Experiment / Cavy | Results
• Continued like this for 3 days
‣ Of 1016 cases, less than 11 needed human intervention
• Prototypes functioned correctly
‣ Reduced size of database
‣ Handled new faults with out human intervention
‣ Narrowed down the possible failures to 9 prototype cases
‣ Showed “complex” problems were just simultaneous simple problems
Future Work
• Use in real-world applications
• Working around the given assumptions
• Use of prototyping/generalization
• Combine CBR with other knowledge sources
‣ Combine CBR with some other methodology
Conclusion
‣ CBR a good solution to self-healing
‣ Repair procedure triggered by service failures
‣ No structured knowledge needed
‣ Worked well even with novel faults