1 fault-tolerance techniques for mobile agent systems prepared by: wong tsz yeung date: 11/5/2001
TRANSCRIPT
![Page 1: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/1.jpg)
1
Fault-Tolerance Techniques for Mobile Agent Systems
Prepared by: Wong Tsz Yeung
Date: 11/5/2001
![Page 2: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/2.jpg)
2
IntroductionMobile agent has been proposed in
different application domains: E-commerce Mobile Computing
It is important to have: Fault-detection Recovery
![Page 3: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/3.jpg)
3
Mobile Agent Execution ModelA mobile agent executes on a sequence
of machines.A place provides a logical execution
environment for the agent.Executing an agent at a place is called
a stage of the agent execution.
iP
iS
![Page 4: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/4.jpg)
4
Mobile Agent Failure ModelWe can classify failures into 3 classes:
Agent failure Place failure Machine failure
We assume that agent failure and place failure will not happen.
![Page 5: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/5.jpg)
5
Mobile Agent Failure ModelWhen a machine failure happens, all
agents executing will be terminates.When an agent wants to travel to a
failed host, an exception will be raised.We assume that the agent will be
terminated in this case.
![Page 6: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/6.jpg)
6
Problems of failures Agent travels in the network. It is difficult to estimate the running time of an
agent. Two problems :
Agent owner believes that the agent has been lost, but, in fact, it is not.
Agent owner waits for the agent to finish its execution, but the agent is actually terminated abnormally.
![Page 7: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/7.jpg)
7
Concerns in Protocol DesignBlocking-free
Assume that we have a prefect failure detection mechanism.
Suppose we have checkpoint every agent at every host.
If we have detected a host fails, we restart that failed host.
![Page 8: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/8.jpg)
8
Concerns in Protocol Design This kind of recovery is prone to blocking. While the recovery is taking place, the
execution is blocked until the recovery finishes.
![Page 9: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/9.jpg)
9
Concerns in Protocol DesignExactly-once
Suppose an agent is trapped inside a very busy network.
If the owner launches another agent, we will have 2 instances in the network.
It will double the effect done by a single agent if the actions are not idempotent (non-intrusive).
![Page 10: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/10.jpg)
10
Server Failure DetectionA server fault-tolerance mechanism is
two-folded. Agent have to stop traveling to failed
server. There should be global daemons detecting
failures. Once failure is detected, recovery should
take place.
![Page 11: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/11.jpg)
11
Server Failure DetectionA simple server fault-tolerance
mechanism: When an agent finishes computation, it
checks if the next server is available or not. If yes, it travels to that server. If no, it waits at its resident server until the
next host is available.
![Page 12: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/12.jpg)
12
Server Failure Detection The way to detect server failure depends
on what agent platform is using. E.g. RMI and RPC.
We run a daemon global to all the servers. This daemon can detect and recover failed servers.
However, the daemon is a single point of failure. We should introduce multiple instances of this monitor daemon to ease the problem.
![Page 13: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/13.jpg)
13
ExperimentWe have set up an experiment on
server failure detection.The network:
Home
![Page 14: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/14.jpg)
14
ExperimentTo introduce failures to the server, we
have a daemon running along with every server.
The job of the daemon is to kill the servers randomly.
We have set the probability to be 0.1 per 2 minutes.
![Page 15: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/15.jpg)
15
ExperimentWe have 2 kinds of agents:
One can detect the availability of the next server.
Another one cannot.The former will wait for recovery.The latter will travel to failed servers
and being terminated.
![Page 16: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/16.jpg)
16
ExperimentWe have a global daemon. It detects and recovers server failures. It detects the servers failures by
following a cyclic server list.
1M 2M 3M 1nM nM
![Page 17: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/17.jpg)
17
ExperimentEstimation of the time between a server
fails and it is recovered: Let p be the probability that a server fails. Let be the time needed to perfore the
recovery process. Let n be the number of servers. The worst time T =
np
![Page 18: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/18.jpg)
18
ExperimentResult
0
10
20
30
40
50
60
70
80
90
100
1 5 10 15 20 25
No. of servers
Perc
enta
ge
w/ server failure detection
w/o server failure detection
![Page 19: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/19.jpg)
19
ExperimentAgents are still losing because the
resident servers of the agents die while the agents are waiting.
The time that the agent is waiting is linearly proportional to the number of servers.
Therefore, the curve is dropping more or less in a linear manner.
![Page 20: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/20.jpg)
20
Agent Failure DetectionPull approach
Pull information out of the agent periodically.
The owner queries the agent.Use agent proxy.Defect:
If agent is on the way traveling to a server, it cannot respond.
![Page 21: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/21.jpg)
21
Agent Failure DetectionPush approach
Agent pushes information to the owner. Agent sends heartbeat messages to the
owner periodically.Better than pull approach
No need to know where the agent is.
![Page 22: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/22.jpg)
22
Agent Failure DetectionDefects of the above 2 approaches
Centralized. Depends on status of the network. Produce a lot of traffic on the network.
![Page 23: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/23.jpg)
23
Agent Failure DetectionCooperative Agent Approach
2 agents are sent at one time. One is called actual agent. Another one is called rear guard. Rear guard always lags the actual agent.
![Page 24: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/24.jpg)
24
Agent Failure Detection
actual
rear
rear
actual
![Page 25: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/25.jpg)
25
Agent Failure DetectionHow does it work:
When the actual agent arrives at a server, it sends message to the rear guard
I am in XXX When the actual agent leaves a server, it
sends message to the rear guard I am leaving XXX
The rear guard will then travel to XXX.
![Page 26: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/26.jpg)
26
Agent Failure DetectionHow to detect and recover the agent:
Assumption (1) Checkpoint of actual agent. It is for the use of recovering actual agent.
Assumption (2) Agent will not be lost while traveling This eliminates the possibility that rear guard
cannot receive I am in XXX message.
![Page 27: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/27.jpg)
27
Agent Failure DetectionCase 1
Rear guard cannot receive the message I am leaving XXX within a timeout period.
This implies the agent crushes. The rear guard can use the checkpointed
actual agent to continue execution.
![Page 28: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/28.jpg)
28
Agent Failure DetectionCase 2
Actual agent cannot send I am in XXX to rear guard.
This implies the rear guard crushes. Actual agent can transmit a rear guard to
its previous server.
![Page 29: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/29.jpg)
29
Agent Failure DetectionAdvantage
Decentralized Probability of both rear guard and actual
agent die are very small. Small amount of messages comparing to
periodic messages.
![Page 30: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/30.jpg)
30
Replicating Servers
a0p0
Stage S0
a1p1
a1p1
a1p1
Stage S1
a2p2
a2p2
a2p2
Stage S2
![Page 31: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/31.jpg)
31
Replicating Servers
a0p0
Stage S0
a1p1
a1p1
a1p1
Stage S1 a2p2
a2p2
Stage S2
a2’p2’
a2’p2’
Stage S2’
![Page 32: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/32.jpg)
32
Checkpointing and RollbackNot all data can be checkpointing easily.Two types of agent data
Strongly reversible objects Weakly reversible objects
![Page 33: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/33.jpg)
33
Checkpointing and RollbackStrongly reversible objects
They can be compensated by means of an image of the objects.
E.g. Information retrieving agent.
![Page 34: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/34.jpg)
34
Checkpointing and RollbackWeakly reversible objects
They may be different from the original data after compensations.
E.g. Electronic money.
![Page 35: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/35.jpg)
35
Conclusion and Future WorkWe will continue to focus on agent
failure detection.The above failure detection schemes do
not satisfy the exactly-once and blocking-free requirements. Efforts are still needed.
![Page 36: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649cb85503460f9497e70f/html5/thumbnails/36.jpg)
36
** END **