self healing operating system

Upload: absheer-khan

Post on 10-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Self Healing Operating System

    1/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    Self Healing Operating System

    Neethu . T VRoll No: 32

    S7 Computer Science and Engineering

    Government Engineering College

    Sreekrishnapuram Palakkad

    November 25, 2010

    http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page1http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1
  • 8/8/2019 Self Healing Operating System

    2/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    OVERVIEW

    1

    Introduction2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS

    9 Future scope

    10 Conclusion

    11

    Reference

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page2http:///reader/full/page1http:///reader/full/page3http:///reader/full/page1http:///reader/full/page1http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    3/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    OVERVIEW

    1

    Introduction2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS

    9 Future scope

    10 Conclusion

    11

    Reference

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page5http:///reader/full/page4http:///reader/full/page3http:///reader/full/page2http:///reader/full/page5http:///reader/full/page4http:///reader/full/page3http:///reader/full/page2http:///reader/full/page4http:///reader/full/page3http:///reader/full/page3http:///reader/full/page2http:///reader/full/page4http:///reader/full/page1http:///reader/full/page2http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    4/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    Introduction

    All applications are dependent on the OS

    When the OS dies, all running applications are lostResilience to errors is an important requirement of modernoperating systemSelf healing enables systems to diagnose themselfs and react

    to faults

    I d i T i l E D i E E i li E fi E d i d S l

    http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page5http:///reader/full/page4http:///reader/full/page3http:///reader/full/page2http:///reader/full/page5http:///reader/full/page4http:///reader/full/page3http:///reader/full/page2http:///reader/full/page5http:///reader/full/page4http:///reader/full/page4http:///reader/full/page3http:///reader/full/page5http:///reader/full/page1http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    5/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS

    9 Future scope

    10 Conclusion

    11

    Reference

    I t d ti T i l E D t ti E E i li E fi t E d t ti d S l

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page7http:///reader/full/page6http:///reader/full/page5http:///reader/full/page4http:///reader/full/page7http:///reader/full/page6http:///reader/full/page5http:///reader/full/page4http:///reader/full/page6http:///reader/full/page5http:///reader/full/page5http:///reader/full/page4http:///reader/full/page6http:///reader/full/page1http:///reader/full/page4http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    6/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    Terminology

    Fault-Defect or flaw in hardware or software

    Error -Deviation from correct stateFailure - Inability to perform expected task

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page7http:///reader/full/page6http:///reader/full/page5http:///reader/full/page4http:///reader/full/page7http:///reader/full/page6http:///reader/full/page5http:///reader/full/page4http:///reader/full/page7http:///reader/full/page6http:///reader/full/page6http:///reader/full/page5http:///reader/full/page7http:///reader/full/page1http:///reader/full/page5http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    7/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS

    9 Future scope

    10 Conclusion

    11

    Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page9http:///reader/full/page8http:///reader/full/page7http:///reader/full/page6http:///reader/full/page9http:///reader/full/page8http:///reader/full/page7http:///reader/full/page6http:///reader/full/page8http:///reader/full/page7http:///reader/full/page7http:///reader/full/page6http:///reader/full/page8http:///reader/full/page1http:///reader/full/page6http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    8/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    Error Detection in Existing OSs

    Custom Error Detection Code in OSs

    Linux - Deadlock Detection, Soft Lockup Detection etc

    Windows - Deadlock Detection etcHardware Memory Protection - MMU

    Watchdog Timers - Linux, Windows etc

    Software Memory Protection - SafeDrive, XFI

    Periodic Consistency Checks - EROS

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page9http:///reader/full/page8http:///reader/full/page7http:///reader/full/page6http:///reader/full/page9http:///reader/full/page8http:///reader/full/page7http:///reader/full/page6http:///reader/full/page9http:///reader/full/page8http:///reader/full/page8http:///reader/full/page7http:///reader/full/page9http:///reader/full/page1http:///reader/full/page7http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    9/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS

    9 Future scope

    10 Conclusion

    11

    Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page11http:///reader/full/page10http:///reader/full/page9http:///reader/full/page8http:///reader/full/page11http:///reader/full/page10http:///reader/full/page9http:///reader/full/page8http:///reader/full/page10http:///reader/full/page9http:///reader/full/page9http:///reader/full/page8http:///reader/full/page10http:///reader/full/page1http:///reader/full/page8http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    10/31

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    Error recovery in Existing OSs

    Linux -Recovery by terminating thread

    Restart Failed Component

    Windows Vista - Example: Video Card DriverMinix3ChorusLinux+NooksIBM z/OS

    Hardware RedundancyReboot Entire System

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page11http:///reader/full/page10http:///reader/full/page9http:///reader/full/page8http:///reader/full/page11http:///reader/full/page10http:///reader/full/page9http:///reader/full/page8http:///reader/full/page11http:///reader/full/page10http:///reader/full/page10http:///reader/full/page9http:///reader/full/page11http:///reader/full/page1http:///reader/full/page9http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    11/31

    gy y g g y

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS

    9 Future scope

    10 Conclusion

    11

    Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page13http:///reader/full/page12http:///reader/full/page11http:///reader/full/page10http:///reader/full/page13http:///reader/full/page12http:///reader/full/page11http:///reader/full/page10http:///reader/full/page12http:///reader/full/page11http:///reader/full/page11http:///reader/full/page10http:///reader/full/page12http:///reader/full/page1http:///reader/full/page10http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    12/31

    gy y g g y

    Error signaling

    C++ exception handling is used for unified error signaling

    Devoloper defined exceptionsProcessor exceptions

    Benifits of mapping processor exceptions to languageexceptions

    Local error recovery using c++ catch statementGeneric handlers for all type of exceptionsGeneric handlers that just print out an error message and haltthe system

    Normal run-time performance overhead is negligible

    Provide developers a flexible and powerful technique

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page13http:///reader/full/page12http:///reader/full/page11http:///reader/full/page10http:///reader/full/page13http:///reader/full/page12http:///reader/full/page11http:///reader/full/page10http:///reader/full/page13http:///reader/full/page12http:///reader/full/page12http:///reader/full/page11http:///reader/full/page13http:///reader/full/page1http:///reader/full/page11http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    13/31

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS9 Future scope

    10 Conclusion

    11 Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page14http:///reader/full/page13http:///reader/full/page13http:///reader/full/page12http:///reader/full/page14http:///reader/full/page1http:///reader/full/page12http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    14/31

    Error confinement

    Isolate Os components

    Used by microkernal:L4,Minix3

    Nooks:Device driver isolation in linux

    Objects in Choices can be placed in separate memoryprotection domains

    Implemented using wrappers which inherit from target Classes

    Example Protected Objects: Serial Port Driver,FileSystem

    Inodes, Timer Driver

    Recovery can be targeted toward the effected component

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page15http:///reader/full/page14http:///reader/full/page14http:///reader/full/page13http:///reader/full/page15http:///reader/full/page1http:///reader/full/page13http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    15/31

    Choices protected components

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page16http:///reader/full/page15http:///reader/full/page15http:///reader/full/page14http:///reader/full/page16http:///reader/full/page1http:///reader/full/page14http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    16/31

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS9 Future scope

    10 Conclusion

    11 Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page17http:///reader/full/page16http:///reader/full/page16http:///reader/full/page15http:///reader/full/page17http:///reader/full/page1http:///reader/full/page15http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    17/31

    Error detection and Recovery

    Code Reloading

    Component Micro-RebootsAutomatic Service Restarts

    Watchdog-based Recovery

    Process-level Recovery

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page18http:///reader/full/page17http:///reader/full/page17http:///reader/full/page16http:///reader/full/page18http:///reader/full/page1http:///reader/full/page16http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    18/31

    Code reloading

    Fault: Corruption of OS code by software bugs or hardwarebit-flips (Single Event Upsets)

    Proactive Recovery: Periodically checksum OS code and

    reload corrupted pages from stable storageReactive Recovery: If undefined instruction exception israised, reload relevant OS page from stable storage

    Simple fault-injection experiments show 89 % recovery

    Example: ARM based microprocessor for mobile phoneincludes Run Time Integrity Checker(RTIC)

    Also used in EROS

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page19http:///reader/full/page18http:///reader/full/page18http:///reader/full/page17http:///reader/full/page19http:///reader/full/page1http:///reader/full/page17http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    19/31

    Component micro-reboots

    Error: Unhandled Exceptions in Components

    Recovery: Similar to component restarts in existing systemsInvolves destroying and re-creating C++ object

    After micro-reboot , internal state may be error free

    Request is re-tried after micro-reboot

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page20http:///reader/full/page19http:///reader/full/page19http:///reader/full/page18http:///reader/full/page20http:///reader/full/page1http:///reader/full/page18http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    20/31

    Automatic service restarts

    Error: Unhandled Exception in a Process

    Recovery: Automatically restart process

    Used when component level restarts fail or if error occursoutside components (framework code)

    Fault injection experiments show 78.9% recovery for processdispatcher (idle thread)

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page21http:///reader/full/page20http:///reader/full/page20http:///reader/full/page19http:///reader/full/page21http:///reader/full/page1http:///reader/full/page19http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    21/31

    Watchdog-based recovery

    Error: Lockups inside OS

    Recovery: Terminate locked up thread or dispatch exception

    Thread termination explored on Linux

    An OS hardware watchdog works by setting a count downtimer to run

    computer malfunctions the tickles stop and the watchdogeventually counts down to zero and does an automatic rebootof the computer.

    Exceptions allow possible local recovery without anyinformation loss (in contrast with thread termination)

    Lockup fault injection experiments about 70 % recovery

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page22http:///reader/full/page21http:///reader/full/page21http:///reader/full/page20http:///reader/full/page22http:///reader/full/page1http:///reader/full/page20http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    22/31

    process recovery

    What to do when OS error recovery is not possible?

    Last Resort

    Ensure minimal working subsystems - disk, recovery code

    Save individual process stateRestore processes after full reboot

    Item Explored on Linux

    Re-use code for process checkpointing/migration support

    Can recovery from arbitrary OS corruption that does notaffect user process state

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page22http:///reader/full/page21http:///reader/full/page23http:///reader/full/page1http:///reader/full/page21http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    23/31

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS9 Future scope

    10 Conclusion

    11 Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page25http:///reader/full/page24http:///reader/full/page23http:///reader/full/page22http:///reader/full/page25http:///reader/full/page24http:///reader/full/page23http:///reader/full/page22http:///reader/full/page24http:///reader/full/page23http:///reader/full/page23http:///reader/full/page22http:///reader/full/page24http:///reader/full/page1http:///reader/full/page22http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    24/31

    solaris 10 OS

    Introduce new architecture for buildingand deploying systems and servicescapable of predictive self healing

    Solaris fault manager and solaris service

    manager are two main components ofpredictive self heling

    Fault manager receives hardware andsoftware errors and diagonose

    automaticallyService manager provideservices,permitting automatic self healing

    Services include start,stop,restart

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page25http:///reader/full/page24http:///reader/full/page23http:///reader/full/page22http:///reader/full/page25http:///reader/full/page24http:///reader/full/page23http:///reader/full/page22http:///reader/full/page25http:///reader/full/page24http:///reader/full/page24http:///reader/full/page23http:///reader/full/page25http:///reader/full/page1http:///reader/full/page23http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    25/31

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS9 Future scope

    10 Conclusion

    11 Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page27http:///reader/full/page26http:///reader/full/page25http:///reader/full/page24http:///reader/full/page27http:///reader/full/page26http:///reader/full/page25http:///reader/full/page24http:///reader/full/page26http:///reader/full/page25http:///reader/full/page25http:///reader/full/page24http:///reader/full/page26http:///reader/full/page1http:///reader/full/page24http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    26/31

    Future scope

    Working on OS restructuring to reduce error propagation and

    prevent state loss during component micro-reboots

    Framework for developer specified policies to governmicro-reboots and service restarts

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page27http:///reader/full/page26http:///reader/full/page25http:///reader/full/page24http:///reader/full/page27http:///reader/full/page26http:///reader/full/page25http:///reader/full/page24http:///reader/full/page27http:///reader/full/page26http:///reader/full/page26http:///reader/full/page25http:///reader/full/page27http:///reader/full/page1http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    27/31

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS9 Future scope

    10 Conclusion

    11 Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page29http:///reader/full/page28http:///reader/full/page27http:///reader/full/page26http:///reader/full/page29http:///reader/full/page28http:///reader/full/page27http:///reader/full/page26http:///reader/full/page28http:///reader/full/page27http:///reader/full/page27http:///reader/full/page26http:///reader/full/page28http:///reader/full/page1http:///reader/full/page26http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    28/31

    Conclusion

    Self-Healing Operating Systems may be built by incorporating

    a variety of recovery techniques to address different faultmodels

    It is also possible to detect and attempt recovery from systemhangs that would otherwise remain undetected.

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page29http:///reader/full/page28http:///reader/full/page27http:///reader/full/page26http:///reader/full/page29http:///reader/full/page28http:///reader/full/page27http:///reader/full/page26http:///reader/full/page29http:///reader/full/page28http:///reader/full/page28http:///reader/full/page27http:///reader/full/page29http:///reader/full/page1http:///reader/full/page27http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    29/31

    OVERVIEW

    1 Introduction

    2 Terminology

    3 Error Detection

    4 Error recovery

    5 Error signaling

    6 Error confinement

    7 Error detection and recovery

    8 Solaris 10 OS9 Future scope

    10 Conclusion

    11 Reference

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page30http:///reader/full/page29http:///reader/full/page29http:///reader/full/page28http:///reader/full/page30http:///reader/full/page1http:///reader/full/page28http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    30/31

    Reference

    1 ARM Integrator Family from the website

    http://www.arm.com/miscPDFs/8877.pdf[visited onnovember 10]

    2 P. M. Chen, W. T. Ng, S. Chandra, C. Aycock, G. Rajamani,and D. Lowell. The Rio File Cache: Surviving Operating

    System Crashes. In Architectural Support for ProgrammingLanguages and Operating Systems, pages 74-83, 2004

    3 Dijkstra, E.: Self-stabilizing systems in spite of distributedcontrol. Communications of the ACM,1974

    4 M. Baker and M. Sullivan. The Recovery Box: Using Fast

    Recovery to Provide High Availability in the UNIXEnvironment.In USENIX,pages 31-44, Summer 2005

    5 Building a self heal operating systemhttp://choices.cs.uiuc.edu/selfhealing.pdf [visited on

    november 6]

    Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page30http:///reader/full/page30http:///reader/full/page29http:///reader/full/page31http:///reader/full/page1http:///reader/full/page29http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3
  • 8/8/2019 Self Healing Operating System

    31/31

    THANK YOU

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page31http:///reader/full/page31http:///reader/full/page30http:///reader/full/page31http:///reader/full/page1http:///reader/full/page30