microreboot

28
Microreboot: A Cheap Technique for Recovery George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, Armando Fox Presented By Riyad

Upload: riyad-parvez

Post on 25-May-2015

65 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Microreboot

Microreboot: A Cheap Technique for Recovery

George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, Armando Fox

Presented By Riyad

Page 2: Microreboot

Motivation● Production software has many transient

bugs● Rebooting can “cure” failures caused by

transient bugs● Rebooting is expensive, causes nontrivial

service disruption and downtime● Microreboot (µRB)!!!

Page 3: Microreboot

Microreboot (µRB)● Reboot individual fine-grained component● Similar as application reboot

○ Magnitudes faster recovery○ Few failed requests during recovery○ Less lost works due to recovery

● Rejuvenate the system without shutting it down● System needs to be designed microrebootable

from ground up.

Page 4: Microreboot

µRB Goals● Reduce system recovery time

● Minimize failure’s disruption to system and users

● Preserve in-memory data

Page 5: Microreboot

Crash Only System Design● Don’t try to take complex recovery process● Upon detecting failures crash gracefully● Keep state in stable storage● Ensure consistency of state and data before

crashing● Recover from failure by rebooting

application

Page 6: Microreboot

µRB System Design● Fine-grain components

○ Component-level µRB and fast initialization○ Huge components lower benefit of µRB

● State segregation○ Prevent reading inconsistent state during recovery○ Separates data recovery and application recovery

● Decoupling○ Lower disruption across system during recovery

Page 7: Microreboot

µRB System Design● Retryable requests

○ Minimize number of failures during recovery● Leases

○ Improve the reliability of cleaning up after μRBs, otherwise may leak resources

Page 8: Microreboot

Research Questions● Are μRBs effective in recovering from

failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a

performance overhead?

Page 9: Microreboot

Experiment● J2EE Application● JBoss Server (modified to support µRB)● eBid, a crash only application based on

RUBiS● MySQL for persistent state● FastS/SSM for session state

Page 10: Microreboot

Injected Faults● Deadlocks● Infinite loops ● Memory leaks ● Transient Java exceptions ● Corrupted data structures● Out of Memory error ● Low-level faults underneath the JVM layer

Page 11: Microreboot

Failure Detection● Network-level error or an HTTP 4xx or 5xx

error or keywords indicative of failure (e.g., “exception,” “failed,” “error”).

● Submits in parallel each request to fault injected application, good application. Discrepancy between two results is “failure”.

Page 12: Microreboot

Recovery Group● EJBs might maintain references to other

EJBs ○ Cannot be microrebooted individually

● Whenever an EJB is to be microrebooted, microreboot the transitive closure of its inter-EJB dependents as a group.

Page 13: Microreboot

Recovery Manager● Micorereboots -

○ EJBs (Recovery Group)○ the WAR○ All of eBid○ The JVM that runs JBoss○ The operating system.

● Reboots component related to failed URL.● Tries the cheapest recovery first

Page 14: Microreboot

Research Questions● Are μRBs effective in recovering from

failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a

performance overhead?

Page 15: Microreboot

μRB Failure Recovery

Page 16: Microreboot

Failed Requests

Page 17: Microreboot

Research Questions● Are μRBs effective in recovering from

failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a

performance overhead?

Page 18: Microreboot

Failure + Recovery

Page 19: Microreboot

Client-perceived Availability

Page 20: Microreboot

Research Questions● Are μRBs effective in recovering from

failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a

performance overhead?

Page 21: Microreboot

μRB in Cluster

Page 22: Microreboot

Client-perceived Availability● Response latency more than 8 seconds, user

get distracted

Page 23: Microreboot

Research Questions● Are μRBs effective in recovering from

failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a

performance overhead?

Page 24: Microreboot

Performance Impact

Page 25: Microreboot

Limitations● µRB can leave system inconsistent if updates

aren’t atomic ● µRB can leak resources if resources aren’t

allocated through application server (Java Native Interface)

● Can delay full reboot when it’s the only way

Page 26: Microreboot

Limitations● Recovers from only transient bugs● Considerable design effort needed● Not suitable for

○ Existing monolithic applications○ (C/C++) don’t have such JavaEE like framework

● Experiment on only one recovery group closure with 5 EJBs.

Page 27: Microreboot

Microreboot● Cheap alternative of full system recovery

● Restart components “with a clean state”

● Reduces recovery time, failed requests, functional disruptions

● Only suitable for application with fine-grained components

Page 28: Microreboot

Discussion● Do μRBs lead to overengineering -

AbstractAbstractObjectFactory● Is modifications needed for μRB worth for

existing monolithic applications?● Possible to have a recovery technique for

CPU-bound application?