parallel job deployment and monitoring in a hierarchy of mobile agents

50
5/25/2006 CSS Speaker Series 1 Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents Munehiro Fukuda Computing & Software Systems, University of Washington, Bothell Funded by

Upload: italia

Post on 14-Jan-2016

44 views

Category:

Documents


3 download

DESCRIPTION

Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents. Munehiro Fukuda Computing & Software Systems, University of Washington, Bothell Funded by. Outline. Introduction Execution Model System Design Performance Evaluation Related Work Conclusions. 1. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 1

Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

Munehiro FukudaComputing & Software Systems, University of Washington, Bothell

Funded by

Page 2: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 2

Outline

1. Introduction

2. Execution Model

3. System Design

4. Performance Evaluation

5. Related Work

6. Conclusions

Page 3: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 3

1. Introduction

Problems in Grid Computing Background of Mobile Agents Objective Project Overview

Page 4: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 4

Quiet Laboratories UW1-320 and UW1-302 at 3pm on a weekday

No more computing resources needed?

Page 5: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 5

Demands for Computing ResourcesIn Teaching I'm in the 320 lab testing my program, and for some reason,

whenever I attempt to use 15 hosts, it asks me for passwords for hosts 31 - 18 and then freezes and does nothing.

I noticed that uw1-320-20 is being bogged down by zombie processes that someone left going on it:

I went around looking on some of the other computers, and its all over the place: user A has almost 40 processes running on uw1-320-30 since April 23rd, user B has about 10 on host 29, and there’s a ton more on almost every host.

I got tired of manually running a bunch of ssh commands to run rmi on many different machines.

I have narrowed the problem down to three machines: 16, 20, and 30. First of all uw1-320-20 is dead. It drops all incoming ssh connections. The other two, uw1-320-16 and uw1-320-30, both have a mysterious problem that I don't know how to solve.

Page 6: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 6

Demands for Computing ResourcesIn Research http://setiathome.berkeley.edu/

http://boinc.bakerlab.org/rosetta/rah_about.php

These are an effective way to collect numerous computing resources from all over the world.

But, here is a question:

Why don’t they use idle machines on their campuses first?

Page 7: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 7

Grid-Computing Brokers Desktops

Buyers: a desktop user Sellers: hardware components Brokers: Windows, Linux

Clusters Buyers: multiple users (e.g., CSS434 students) Sellers: cluster computing nodes Brokers: PBS, LSF

Grid computing Buyers: Seti@home, Rosseta@home, etc. Brokers: Globus, Condor, Legion (Avaki), NetSolve, Ninf,

Entropia, etc. Okay, no need to implement any more?

Page 8: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 8

Problems in Grid Computing

Targeting large business models A central entry point A lot of installation work

http://www.globus.org/toolkit/docs/4.0/

Little system faults Too gigantic

Page 9: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 9

Our Target

Network

Targeting a group of computer users No central entry point

No central managers No programming model restrictions

Easy installation work Easy participation but necessity of fault tolerance

Page 10: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 10

Background of Mobile Agents

Internet

Centralmanger Server Server Server

FTPHTTP

RPC

Cycle CycleCycle

User

An execution model previously highlighted as a prospective infrastructure of distributed systems.

Static job deployment and result collection: No more than an alternative approach to centralized grid middleware implementation

Our goal: Let mobile agents do unique tasks in grid computing

Page 11: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 11

Objective Focus on a group of

independent computers Turned on and off independently

Not controlled by a scheduler such as PBS and LSF

Not managed by a central server

Let mobile agents do unique tasks in grid computing Runtime job migration:

Moving a program from a faulty/busy site to an active/idle site

Seeking for fault tolerance and better load balancing

Negotiation: Negotiating with other agents

about computing resources Seeking for better load balancing

Inherent parallelism: Deploying and monitoring jobs in

parallel Decentralized job management

Page 12: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 12

Project Overview

Funded by: NSF Middleware Initiative Sponsored by:University of Washington In Collaboration of: Ehime University In a Team of: UWB Undergraduates

Page 13: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 13

2. Execution Model

System Overview Execution Layer Programming Environment

Page 14: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 14

System Overview

FTPServer

UserA

UserB

UserB

snapshotsnapshot

snapshots snapshots

User program wrapper

SnapshotMethods

GridTCP

User program wrapper

SnapshotMethods

GridTCP

User program wrapper

SnapshotMethods

GridTCP

snapshot

User A’sProcess

User A’sProcess

User B’sProcess

TCPCommunication

Commander Agent

Commander Agent

Sentinel Agent

Sentinel Agent

Resource Agent

Sentinel Agent

Resource Agent

Bookkeeper Agent

BookkeeperAgent

ResultsResults

Page 15: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 15

Execution Layer

Operating systems

UWAgents mobile agent execution platform

Commander, resource, sentinel, and bookkeeper agents

User program wrapper

GridTcpJava socket

mpiJava-AmpiJava-S

mpiJava API

Java user applications

Page 16: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 16

MPI Java Programmingpublic class MyApplication { public GridIpEntry ipEntry[]; // used by the GridTcp socket library public int funcId; // used by the user program wrapper public GridTcp tcp; // the GridTcp error-recoverable socket public int nprocess; // #processors public int myRank; // processor id ( or mpi rank) public int func_0( String args[] ) { // constructor MPJ.Init( args, ipEntry, tcp ); // invoke mpiJava-A .....; // more statements to be inserted return 1; // calls func_1( ) } public int func_1( ) { // called from func_0 if ( MPJ.COMM_WORLD.Rank( ) == 0 ) MPJ.COMM_WORLD.Send( ... ); else MPJ.COMM_WORLD.Recv( ... ); .....; // more statements to be inserted return 2; // calls func_2( ) } public int func_2( ) { // called from func_2, the last function .....; // more statements to be inserted MPJ.finalize( ); // stops mpiJava-A return -2; // application terminated }}

Page 17: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 17

3. System Design Mobile Agents Job Coordination

Distribution Resource allocation and monitoring Resumption and migration

Programming Support Language preprocessing Communication check-pointing

Inter-Cluster Job Deployment (Current Research Topic) Over-gateway agent migration Over-gateway communication Job distribution

Page 18: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 18

id 0

Agent domain (time=3:31pm, 8/25/05 ip = perseus.uwb.edu name = fukuda)

id 0

UWInject: submits a new agent from shell.

Agent domain (time=3:30pm, 8/25/05 ip = medusa.uwb.edu name = fukuda)

UWAgents – Concept of Agent Domain

User

id 1 id 2 id 3

id 7id 6id 5id 4 id 11id 10id 9id 8 id 12

-m 4

id 1 id 2

-m 3

UWPlace

A user job

Page 19: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 19

Job DistributionUser

Commanderid 0

Sentinelid 2

rank 0

Bookkeeperid 3

rank 0

Resourceid 1eXist

Sentinelid 8

rank 1

Sentinelid 11

rank 4

Sentinelid 10

rank 3

Sentinelid 9

rank 2

Bookkeeper

id 12rank 1

Bookkeeper

id 15rank 4

Bookkeeper

id 14rank 3

Bookkeeper

id 13rank 2

Sentinelid 32

rank 5

Sentinelid 34

rank 7

Sentinelid 33

rank 6

Bookkeeper

id 48rank 5

Bookkeeper

id 50rank 7

Bookkeeper

id 49rank 6

Job Submission

XML QuerySpawn

id: agent idrank: MPI Rank

snapshot

snapshot

Sensorid 4

Sensorid 5

Page 20: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 20

Resource Allocation and Monitoring

Node 1Node 0 Node 2

User

Commanderid 0

Resourceid 1eXist

Job submission

An XML query

CPU ArchitectureOSMemoryDiskTotal nodesMultiplier

total nodes x multiplier

A list of available nodes

Spawn

Sentinelid 2

rank 0

Bookkeeperid 2

rank 0

Node5Node 4Node 3

Sentinelid 8

rank 1

Bookkeeperid 12

rank 5

Sentinelid 2

rank 0

Sentinelid 8

rank 1

Bookkeeperid 2

rank 0

Bookkeeperid 12

rank 5

Case 1:Total nodes = 2Multiplier = 1.5

Case 2:Total nodes = 2Multiplier = 3

Future use Future use Future use

Sensorid 4

Sensorid 5

Sensorid 16

Sensorid 18

Sensorid 17

Sensorid 19

Sensorid 20

Sensorid 22

Sensorid 21

Sensorid 23

ttcp

Performance data

ttcp

ttcp

Our ownXML DB

Page 21: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 21

Job Resumption by a Parent SentinelSentinel

id 2rank 0

Sentinelid 8

rank 1

Sentinelid 11

rank 4

Sentinelid 10

rank 3

Sentinelid 9

rank 2

Bookkeeperid 15

rank 4

(0) Send a new snapshot periodically

MPI connections

(2) Search for the latest snapshot

(1) Detect a ping error

Sentinelid 11

rank 4

New

(4) Send a new agent

(5) Restart a user program

(3) Retrieve the snapshot

Page 22: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 22

Job Resumption by a Child Sentinel

Commanderid 0

Sentinelid 2

rank 0

Bookkeeperid 3

rank 0

Sentinelid 8

rank 1

Bookkeeper

id 12rank 1

Resourceid 1

(1) No pings for 8 * 5 (= 40sec)

No pings for 12 * 5 (= 60sec)

(2) Search for the latest snapshot

(3) Search for the latest snapshot

(4) Retrieve the snapshot

NewSentinel

id 2rank 0

(5) Send a new agent

(7) Search for the latest snapshot

(8) Search for the latest snapshot

(9) Retrieve the snapshot

(11) Detect a ping error (13) Detect a ping error and follow the samechild resumption procedure as in p9.

Commanderid 0

(10) Send a new agent

(6) No pings for 2 * 5 (= 10sec)

(12) Restart a new resource agent from its beginning

Resourceid 1

New

Page 23: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 23

User Program Wrapper

statement_1;statement_2;statement_3;check_point( );statement_4;statement_5;statement_6;check_point( );statement_7;statement_8;statement_9;

check_point( )

;

int fid = 1;while( fid == -2) { switch( func_id ) { case 0: fid = func_0( ); case 1: fid = func_1( ); case 2: fid = func_2( ); }}check_point( ) { // save this object // including func_id // into a file}

func_0( ) { statement_1; statement_2; statement_3; return 1;}func_1( ) { statement_4; statement_5; statement_6; return 2;}func_2( ) { statement_7; statement_8; statement_9; return -2;}

User Program WrapperSource Code

Preprocessed

Cryptography

Page 24: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 24

Pre-proccesser and Drawback

No recursions Useless source line numbers indicated upon errors Still need of explicit snapshot points.

statement_1;statement_2;statement_3;check_point( );while (…) { statement_4; if (…) { statement_5; check_point( ); statement_6; } else statement_7; statement_8;}check_point( );

int func_0( ) { statement_1; statement_2; statement_3; return 1;}int func_1( ) { while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement_8; }}

int func_2( ) { statement_6; statement_8; while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement8; }}

Source Code Preprocessed Code

Before check_point( ) in if-clause

After check_point( ) in if-clause

Preprocessed

Page 25: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 25

GridTcp – Check-Pointed Connection

n1.uwb.edu

n3.uwb.edu

n2.uwb.edu

TCPuser

program

rank ip

1 n1.uwb.edu

2 n2.uwb.edu

outgoing

backup

incoming

User ProgramWrapper

Snapshotmaintenance

TCP

userprogram

n2.uwb.edu2

n1.uwb.edu1

iprank

incoming

ougoing

backup

User ProgramWrapper

n3.uwb.edu

userprogram

n3.uwb.edu2

n1.uwb.edu1

iprank

incoming

ougoing

backup

User ProgramWrapper

TCP

Outgoing packets saved in a backup queue All packets serialized in a backup file every check

pointing Upon a migration

Packets de-serialized from a backup file Backup packets restored in outgoing queue IP table updated

Page 26: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 26

Inter-Cluster Job DeploymentCurrent Research Topic

Over-gateway agent deployment Over-gateway TCP communication Over-gateway agent tree creatioin

10.0.0.3

medusa.uwb.edu

uw1-320-00.uwb.edu uw1-320-01.uwb.edu

10.0.0.4 10.0.0.7

Internet

Private domain

Commanderid 0

Sentinelid 2

Sentinelid 8

Sentinelid 9

How?

Page 27: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 27

mnode0

medusa.uwb.edu

uw1-320-00.uwb.edu uw1-320-01.uwb.edu

mnode1 mnode4

Internet

Private domain

id 0id 1

UWAgents – Over Gateway Migration

id 1 id 1

id 1

spawnChild( )hop( )

hop( )hop( )

talk( )

Parent and children keep track of a route to each other’s current position.

A daemon maintains where a gateway is.

Page 28: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 28

mnode0

medusa.uwb.edu

uw1-320-00.uwb.edu uw1-320-01.uwb.edu

mnode1 mnode4

Internet

Private domain

GridTcp – Over-Gateway Connection

Commanderid 0

Sentinelid 2

rank 0

Sentinelid 8

rank 1

Sentinelid 9

rank 2

userprogram

User Program Wrapper

-medusa2

-mnode01

medusauw1-320-000

gatewaydestrank

userprogram

User Program Wrapper

-medusa2

medusamnode01

-uw1-320-000

gatewaydestrank

userprogram

User Program Wrapper

-medusa2

-mnode01

-uw1-320-000

gatewaydestrank

Page 29: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 29

Partition 2

Over-Gateway Agent Tree CreationPossible Solutions

User

Commanderid 0

Sentinelid 2

rank 0

Sentinelid 8

rank 1

Sentinelid 11

rank 4

Sentinelid 10

rank 3

Sentinelid 9

rank 2

Sentinelid 32

rank 5

Sentinelid 34

rank 7

Sentinelid 33

rank 6

Sentinelid 35

rank 8

Sentinelid 46

rank 19

Sentinelid 47

rank 20

Bookkeeperid 3

rank 0

Resourceid 1

Cluster 0

Cluster 1

Cluster 2

Partition 1

Page 30: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 30

Sentinelid 531

rank 10

Sentinelid 131rank 4

Over-Gateway Agent Tree CreationFinal Solution

User

Commanderid 0

Sentinelid 2

Sentinelid 8

rank -8

Sentinelid 33

rank -33

Sentinelid 32

rank 0

Sentinelid 9

rank X

Sentinelid 130rank 3

Sentinelid 129rank 2

Sentinelid 132rank 6

Sentinelid 34

rank -34

Bookkeeperid 3

rank 0

Resourceid 1

Sentinelid 512rank 5

Sentinelid 530rank 9

Sentinelid 529rank 8

Sentinelid 35

rank -35

Sentinelid 39

rank X+4

Sentinelid 128rank 1

Sentinelid 38

rank X+3

Sentinelid 37

rank X+2

Sentinelid 36

rank X+1

Sentinelid 528rank 7

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Cluster gateway 0

Cluster gateways 1, 2, and 3

Desktop computers

Page 31: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 31

4. Performance Evaluation

Evaluation Environment: A 8-node Myrinet-2000 cluster: 2.8GHz pentium4-Xeon w/ 512MB A 24-node Giga-Ethernet cluster: 3.4GHz Pentium4-Xeon

w/512MB

Computation Granularity Java Grande MPJ Benchmark Process Resumption Overhead File Transfer

Page 32: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 32

Computational Granularity 1Master-slave computation

0

1

10

100

1000

10,0

00/1

,000

10,0

00/1

0,00

0

10,0

00/1

00,0

00

20,0

00/1

,000

20,0

00/1

0,00

0

20,0

00/1

00,0

00

40,0

00/1

,000

40,0

00/1

0,00

0

40,0

00/1

00,0

00

Size (doubles) / # floating-point divides

Tim

e (sec)

1 CPU

8 CPUs

16 CPUs

24 CPUsMaster

Slave Slave Slave Slave Slave

Communication

Master-slave computation

Page 33: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 33

Computational Granularity 2Heartbeat

0

1

10

100

1000

10,0

00/1

,000

10,0

00/1

0,00

0

10,0

00/1

00,0

00

20,0

00/1

,000

20,0

00/1

0,00

0

20,0

00/1

00,0

00

40,0

00/1

,000

40,0

00/1

0,00

0

40,0

00/1

00,0

00

Size (doubles) / # floating-point divisions

Tim

e (sec)

1 CPU

8 CPUs

16 CPUs

24 CPUs

Process Process Process Process Process

Communication

Heartbeat communication

Page 34: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 34

Computational Granularity 3Broadcast

0

1

10

100

1000

10,0

00/1

,000

10,0

00/1

0,00

0

10,0

00/1

00,0

00

20,0

00/1

,000

20,0

00/1

0,00

0

20,0

00/1

00,0

00

40,0

00/1

,000

40,0

00/1

0,00

0

40,0

00/1

00,0

00

Size (doubles) / # floating-point divides

Tim

e (sec)

1 CPU

8 CPUs

16 CPUs

24 CPUs

Process Process Process Process Process

Communication

All to all broadcast

Page 35: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 35

Performance Evaluation - Series

0

50

100

150

200

250

300

350

1 4 8 12 16 24

# CPUs

Tim

e (sec)

Agent deployment

Disk operations

Snapshot

ApplicationMaster-slave computation

Page 36: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 36

Performance Evaluation - RayTracer

0

50

100

150

200

250

300

350

1 4 8 12 16 24

# CPUs

Tim

e (s

ec)

Agent deployment

Disk operations

Snapshot

Application

All reduce communication but few data to send

Page 37: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 37

Performance Evaluation – MolDyn

0

50

100

150

200

250

300

350

1 2 4 8

# CPUs

Tim

e (s

ec)

Agent deployment

Snapshot

Disk operations

GridTcp overhead

Java application

All to all broadcast

Page 38: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 38

Overhead of Job Resumption

Page 39: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 39

User

Commanderid 0

Sentinelid 2

rank 0

Sentinelid 8

rank 1

Sentinelid 11

rank 4

Sentinelid 10

rank 3

Sentinelid 9

rank 2

Sentinelid 32

rank 5

Sentinelid 34

rank 7

Sentinelid 33

rank 6

Sentinelid 35

rank 8

Sentinelid 46

rank 19

Sentinelid 47

rank 20

Bookkeeperid 3

rank 0

Resourceid 1

AgentTeamwork vs NFS Pipelined Transfer in AgentTeamwork

File Transfer

Page 40: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 40

5. Related Work

From the viewpoints of: System Architecture Fault Tolerance Job Deployment and Monitoring

Page 41: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 41

System Architecture

Systems Architectural basis

Globus A toolkit

Condor Process migration

Ninf, NetSolve RPC

Legion (Avaki) OO

Catalina, J-SEAL2, AgentTeamwork Mobile agents

Difference from Catalina/J-SEAL2 They are not fully implemented. They are based on a master-slave model

Page 42: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 42

Fault Tolerance

Systems Libraries Data recovery Communication recovery

Legion (Avaki) FT-MPI Variables passed to MPI_FT_save( )

Links recovered

Condor MW Library All master data Master-worker communication

Dome Dome_env Objects declared as dXXX <type>

N/A

AgentTeamwork GridTcp All serializable class data

All in-transit messages

Page 43: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 43

Job Deployment and Monitoring

Systems Co-Allocation Module

Deployment Scheme

Globus DUROC Master slave

Condor Grid Manager Mater slave

Legion Scheduler and Enactor

Master slave

AgenTeamwork Sentinel agents Hierarchical

Page 44: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 44

6. Conclusions

Project Summary Next Two Years

Page 45: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 45

Project summary

Applications Computation granularity: 40,000 doubles x 10,000 floating-point

operations Message transfer: Any types except all-to-all communication Entire application size: 3+ times larger than computation

granularity

Current status UWAgent: completed Agent behavioral design: basic job deployment/resumption

implemented User program wrapper: completed including security features GridTcp/mpiJava: in testing Preprocessor: almost completed

Page 46: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 46

Next Two Years

Application support Fault tolerance in file transfer GUI improvement

Agent algorithms Over-gateway application deployment Dynamic resource allocation and monitoring Priority-based agent migration

Performance evaluation Dissemination

Page 47: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 47

Can AgentTeamwork Become Their Competitor?

AgentTeamwork

Nimrod

Page 48: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 48

Questions?

Page 49: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 49

MPJ.Send and Recv Performance

Page 50: Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents

5/25/2006 CSS Speaker Series 50

Mobile Agents

Mobile agents

Naming Cascading termination

Job scheduling

Security

IBM Aglets AgeltFinder traces all agents

Needs to retract one by one

Schedules jobs with Baglets.

Java byte-code verification

Voyager RPC-based system-unique agent IDs

Needs to be implemented at a user level

Launches an independent user process.

CORBA security service

D’Agent Unpredictable agent IDs

Needs to be implemented at a user level

Launches an independent user process.

A currency-based model

Ara

(Obsolete)

Unpredictable agent IDs

Calls ara_kill to kill all agents

Launches an independent user process.

An allowance model

UWAgent Agent domain Waits for all descendants’ termination

Schedules jobs with Java thread functions.

Agent-to-agent security w/ Agent domain