ft-erf fault-tolerance in an event rule framework for distributed systems hillary caituiro-monge,...
TRANSCRIPT
FT-ERFFault-Tolerance in an Event Rule Framework for
Distributed Systems Hillary Caituiro-Monge, Graduate Student.
Advisor: Javier Arroyo-Figueroa, Ph.D.Presentation 3
Presentation Objectives
Understand the Architecture of the Scalable and Fault-Tolerant ERF Architecture
Relate Challenges on Active Replication Analyze Core Lacks among RUBIES replicas,
with the purpose of Achieve Fault-Tolerance: Lack of Timing Synchronization of Rule Evaluation
Cycles (REC) Lack of Consistency of Event Sets (ES) Distributed Agreement Protocol
Presentation Objectives
Introduce Research New Objective
SCALABLE AND FAULT TOLERANT ERF ARCHITECTURE
RUBIESRUBIES (γ11, δ1)
DISTRIBUTION DIMENSIONR
EP
LIC
AT
ION
DIM
EN
SIO
N RUBIESRUBIES (γ12, δ1)
RUBIESRUBIES (γ1M, δ1)
RUBIESRUBIES (γ21, δ2)
RUBIESRUBIES (γ22, δ2)
RUBIESRUBIES (γ2M, δ2)
RUBIESRUBIES (γN1, δN)
RUBIESRUBIES (γN2, δN)
RUBIESRUBIES (γNM, δN)
RUBIESImp(from Logical View)
ReplicationManager(from FT )
FaultNotifier(from FT )
FaultDetector(from FT )
RUBIESFactory
FaultMonitorable(from FT )
GenericFactory(from factory)
PropertyManager(from propertyManager)
Checkpointable(from FT )
Updateable(from FT )
EventChannelInterface(from erf)
CORBAEventInterface(from CORBA)
ProxyPushConsumer(from CORBA)
EventChannel(from CORBA)
ProxyPushSupplier(from CORBA)
ReplicatedServerHandler(from FT )
CompilerInt
(from CORBA)
CompilerImpl(from CORBA)
RUBIESProxy
RUBIESInt(from erf)
ReplicationObjectGroup
ObjectGroupManager(from objectGroup)
ObjectGroup(from objectGroup)
REPLICATION CLASS DIAGRAM
RUBIESImp(from Logical View)
EventChannelInterface
(from erf)
RUBIESInt(from erf)
CORBAEventInterface(from CORBA)
ProxyPushConsumer(from CORBA)
EventChannel(from CORBA)
ProxyPushSupplier(from CORBA)
CompilerImpl(from CORBA)
CompilerInt
(from CORBA)
RUBIESFactory(from FT )
PropertyManager(from propertyManager)GenericFactory
(from factory)
Strategy(from strategy)
DistributionManager(from RUB)
StaticDistribution
DistributedServerHandler
PerformanceNotifier
PerformanceDetector
Strategy(from strategy)
PerformanceAnalyzer
CPUQueue
DinamicDistribuition
Migratable(from M IG)
DistributionObjectGroup
ObjectGroup(from objectGroup)
ObjectGroupManager(from objectGroup)
DistributedRUBIESProxy
Checkpointable(from FT )
DISTRIBUTION CLASS DIAGRAM
Challenges on Active Replication
Strong replica consistency All replicas must have the same state after
method invocations
Duplicated invocation detection and suppression
Lack of Timing Synchronization of Rule Evaluation Cycles (REC) among RUBIES replicas
It is a source of non-deterministic behavior among RUBIES replicas
It is not triggered in response to direct or indirect client’s method invocation
It is always runningThereby the replicas consistency is not
reachable by means of interface based consistency mechanisms
Lack of Timing Synchronization of Rule Evaluation Cycles (REC) among RUBIES replicas
Each replica from a group has its independent REC, where the Starting time differs Duration time differs
Making a scenario where each group member or replica runs each REC including different events.
Lack of Consistency of Event Sets (ES) among RUBIES replicas
It is a source of non-deterministic behavior among RUBIES replicas
The ES’ content changes different for each replica
The ES’ content changes for two reasons: Incoming events Died events
Lack of Consistency of Event Sets (ES) among RUBIES replicas
The ES’ content changes different for each replica, it is as consequence of delivery communication delay of events to each replica.
What is the problem?
Each replica, belong to same group, includes dissimilar events for each consecutive equivalent REC execution. As result each RUBIES replica posts different
events in different times and with different state. Such behavior is a problem for load distribution
and/or replication.
What is the issue?
Strong replica consistency Synchronize rule evaluation cycles among
RUBIES replicas Turn consistent event sets among RUBIES
replicas
How to do it?
Distributed Agreement or Consensus Protocol (Currently working in this)
RUBIES replicas must start each REC after an agreement.
RECs must have an unique ID RECs of same ID must run simultaneously
How to do it?
Distributed Agreement or Consensus Protocol (Currently working in this)
RUBIES replicas must include same events for RECs of same ID
Agreement must include which events will consider Sliding window
Research New Objective
The proposed research will focus on the fault-tolerance problem in ERF.
The main purpose is to design and implement a strong replica consistency mechanism to achieve fault-tolerance.
Procedure
Select an Active Replication Software Must be CORBA Fault-Tolerant Compatible Must be portable Must not be intrusive No commercial
Make an Distributed Agreement Protocol Related Above
OGS (Object Group Service)
Non-intrusive Service approach. Requiring no change to the underlying ORB Compliant with the CORBA specification Not proprietary. Designed and implemented as a set of CORBA objects.
This makes it interoperable between different ORBs. Plans to extend OGS and make it compliant with FT-
CORBA specification. White box.
Eternal Systems FTORB
Non-intrusiveInterception approach. CORBA objects above the ORB support
the interfaces of the OMG Fault-Tolerant standard specifications
Replication mechanisms below the ORB that provide strong replica consistency
Interceptors to reach independence of the ORB and applications.
Others
GMS (Group Communication Service) IRLIsis+Orbix ElectraAQua
Comparison among Fault-Tolerant CORBA systems
Carlo Marchetti et. al. “Architectural Issues on Fault Tolerance in CORBA”, in Proceedings of the SSGRR 2000 Computer & Business Conference, L'Aquila, Italy, 2000
Conclusion
For Fault-Tolerance in ERF is necessary the design and implementation of an agreement protocol with the purpose of achieve strong replica consistency.
Strong replica consistency will enable ERF for distributed scenarios, such as replication, load distribution, load balancing, and so on.
Thanks