ms word

31
Learning agents Project Mid-semester Report October 22 nd , 2002 Group participants: Huayan Gao ([email protected]), Thibaut Jahan ([email protected]), David Keil ([email protected]), Jian Lian ([email protected]) Students in CSE 333 Distributed Component Systems Professor Steven Demurjian Department of Computer Science & Engineering The University of Connecticut

Upload: butest

Post on 12-May-2015

502 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MS Word

Learning agents

Project Mid-semester Report

October 22nd, 2002

Group participants: Huayan Gao ([email protected]),

Thibaut Jahan ([email protected]),David Keil ([email protected]),

Jian Lian ([email protected])

Students in CSE 333

Distributed Component SystemsProfessor Steven Demurjian

Department of Computer Science & EngineeringThe University of Connecticut

Page 2: MS Word

Learning agents midsemester 10/22/02

CONTENTS

CONTENTS........................................................................................................................11. Objectives and goals........................................................................................................12. Topic summary................................................................................................................22. Topic summary................................................................................................................2

2.1 Definition and classification of agents and intelligent agents...................................22.2 Learning.....................................................................................................................22.3 Platform.....................................................................................................................3

3. Topic breakdown.............................................................................................................53.1 Machine learning (David)..........................................................................................53.2 A maze problem (David)...........................................................................................63.3 Agent platform (Jian).................................................................................................73.4 Agent computing (Huayan).......................................................................................73.5 Distributed computing (Jian).....................................................................................83.6 Implementation using Together, UML, and Java (Thibaut)......................................83.7 Extension to UML needed for multi-agent systems(Huayan)...................................9

4. Progress on project, changes in direction and focus......................................................125. Planned activities...........................................................................................................14

5.1 Oct. 23 – Oct. 29:.....................................................................................................145.2 Oct. 30 – Nov. 5:......................................................................................................145.3 Nov. 6 – Nov. 12:....................................................................................................145.4 Nov. 13 – Nov. 19:..................................................................................................145.5 Nov. 20 – Nov. 26:..................................................................................................145.6 Nov. 27 – Dec. 2:.....................................................................................................14

6. References......................................................................................................................15Appendix A: Risks.............................................................................................................17Appendix B: Categories of agent computing.....................................................................17Appendix C: Q-learning Algorithm...................................................................................18

Page 3: MS Word

Learning agents midsemester 10/22/02 1

1. Objectives and goals

Our ambition is to build a general-architecture model of components for learning

agents. The project will investigate current research on software learning agents and will

implement a simple system of such agents. We will demonstrate our work with a

distributed learning agent system that interactively finds a policy for navigating a maze.

Our implementation will be component-based, using UML and Java.

We will begin with the notion intelligent agent, seeking to implement this in a

distributed agent environment on a pre-existing agent platform. Agents, in the sense of

mobile or distributed agents implemented, we will refer to as “deployed agents.”

We will implement the different “generic” components so they can be assembled

easily into an agent. The project may also include investigation on scalability, robustness,

and adaptability of the system. Four candidate components of a distributed learning agent

are perception, action, communication, and learning.

Our design and implementation effort will focus narrowly on an artifact of realistic

limited scope that solves a well-defined arbitrarily simplifiable maze problem using Q-

learning. We will relate the features of our implementation to recent research in the same

narrow area and to broader concepts encountered in the sources.

We select JADE (Java Agent Development Framework) as our software

development framework aimed at developing multi-agent systems and applications

conforming to FIPA (Foundation of Intelligent Physical Agents) standards for learning

agents.

Page 4: MS Word

Learning agents midsemester 10/22/02 2

2. Topic summary

In this section we will discuss the following questions in detail: What is an agent?

What is learning? How are learning and agents combined? What agent platform will we

use?

2.1 Definition and classification of agents and intelligent agents

Researchers involved in agent have offered a variety of definitions. Some general

features that characterize agents are: autonomy, goal-orientedness, collaboration,

flexibility, ability to be self-starting, temporal continuity, character, adaptiveness,

mobility, and capacity to learn.

According to a definition from IBM, “Intelligent agents are software entities that

carry out some set of operations on behalf of a user or another program with some degree

of independence or autonomy, and in so doing, employ some knowledge or

representation of the user's goals or desires.”

“An autonomous agent is a system situated within and a part of an environment that

senses that environment and acts on it, over time, in pursuit of its own agenda and so as

to effect what it senses in the future.”[fra-gei01]. The latter broad definition is close to

the notion of intelligent agent used in the artificial-intelligence field, replacing the logic-

programming knowledge-base-oriented paradigm.

2.2 Learning

Machine learning is a branch of artificial intelligence concerned with enabling

intelligent agents to improve their behavior. Among many categories of learning, we will

focus on reinforcement learning and the special case, Q-learning.

Page 5: MS Word

Learning agents midsemester 10/22/02 3

Reinforcement learning is online rational policy search and uses ideas associated

with adaptive systems and related to optimal control and dynamic programming [sut-

bar98]. It is distinguished from traditional machine-learning research approaches that

assumed offline learning, separated from the application of knowledge learned during a

separate, training phase.

In the broader definition of intelligent agents, the agent responds to its environment

under a policy, which maps from a perceived state of the environment (determined by

agent percepts) to actions. An agent’s actions are a series of responses to previously

unknown, dynamically generated percepts. A rational agent is one that acts to maximize

its expected future reward or performance measure. Because its actions may affect the

environment, such an agent must incorporate thinking or planning ahead into its

computations. Because it obtains information from its environment only through

percepts, it may have incomplete knowledge of the environment. The agent must conduct

a trial-and-error search for a policy that obtains a high performance measure.

Reinforcement by means of rewards is part of that search.

For intelligent agents that use reinforcement learning, unlike systems that learn by

training examples, the issue arises of exploitation of obtained knowledge versus

exploration to obtain new information. Exploration gains no immediate reward and is

only useful if it can improve utility by improving future expected reward. Failing to

explore, however, means sacrificing any benefit of learning.

2.3 Platform

JADE (Java Agent Development Framework) is a software framework fully

implemented in the Java language. It simplifies the implementation of multi-agent

Page 6: MS Word

Learning agents midsemester 10/22/02 4

systems through a middle-ware platform that claims to comply with the FIPA

specifications and through a set of tools that supports the debugging and deployment

phase. The agent platform can be distributed across machines and the configuration can

be controlled via a remote GUI.

According to the FIPA specification, agents communicate via asynchronous message

passing, where objects of the ACL Message class are the exchanged payloads. JADE

creates and manages a queue of incoming ACL messages; agents can access their queue

via a combination of several modes: blocking, polling, timeout and pattern matching

based. As for the transport mechanism protocol, Java RMI, event-notification, and IIOP

are currently used.

The standard model of an agent platform is represented in the following figure.

Fig. 2.3.1 The standard model of an agent platform

JADE is FIPA-compliant Agent Platform, which includes the AMS (Agent

Management System), the DF (Directory Facilitator), and the ACC (Agent

Communication Channel). All these three components are automatically activated at the

agent platform start-up. The AMS provides white-page and life-cycle service,

maintaining a directory of agent identifiers (AID) and agent state. Each agent must

register with an AMS in order to get a valid AID. The Directory Facilitator (DF) is the

Page 7: MS Word

Learning agents midsemester 10/22/02 5

agent who provides the default yellow page service in the platform. The Message

Transport System, also called Agent Communication Channel (ACC), is the software

component controlling all the exchange of messages within the platform, including

messages to/from remote platforms.

3. Topic breakdown

Our project will be focused on grid-based problems for learning agents. It will be a

similar aim to the one expounded in [rus-nor95], but we have extended that simple

example further. Our realization will be more interesting using learning multi-agents and

maybe undecided rewards and walls. We will use JADE (Java Agent Development

Framework) as our main agent platform to develop and implement the maze.

3.1 Machine learning (David)

Part of this project will consist of investigating the literature on machine learning,

particularly reinforcement learning. David will lead this work.

The problem of learning in interaction with the agent’s environment is that of

reinforcement learning (RL). The learner executes a policy search, in some solutions

using a critic to aid the reward inputs as guides to improving policy (see figure below).

Fig. 3.1.1 Learning agent

Page 8: MS Word

Learning agents midsemester 10/22/02 6

Within reinforcement learning we will address Q-learning, a variant in which the

agent incrementally computes, from its interaction with its environment, a table of

expected aggregate future rewards, with values discounted as they extend into the future.

As it proceeds, the agent modifies the values in the table to refine its estimates. The Q

function returns the optimal action, given a state. The evolving table of estimated Q

values is called Qˆ.

3.2 A maze problem (David)

The concrete problem described below will help to define how the project breaks

down into components:

Both [mitchelt97] and [sut-bar98] present a simple example consisting of a maze for

which the learner must find a policy, where the reward is determined by eventually

reaching or not reaching a goal location in the maze.

Fig. 3.2.1 A maze problem

We propose to modify the original problem definition by permitting multiple

distributed agents that communicate, either directly or via the environment. Either the

multi-agent system, or each agent, will use Q-learning. The mazes can be made arbitrarily

simple or complex to fit the speed, computational power, and effectiveness of the system

we are able to develop in the time available.

Page 9: MS Word

Learning agents midsemester 10/22/02 7

A further interesting variant of the problem would be to allow the maze to change

dynamically, either autonomously or in response to the learning agents. Robust

reinforcement learners will adapt successfully to such changes.

3.3 Agent platform (Jian)

There are many kinds of agent platform we may choose from

http://www.ece.arizona.edu/~rinda/compareagents.html. We choose JADE (Java Agent

Development Framework) as a deployed-agent platform.

JADE (Java Agent Development Framework) is a software framework fully

implemented in the Java language. It simplifies the implementation of multi-agent

systems through middleware using a set of tools that supports the debugging and

deployment phase. The agent platform can be distributed across machines (which do not

even need to share the same OS) and the configuration can be controlled via a remote

GUI. The configuration can be even changed at run time by moving agents from one

machine to another one, as and when required. 

3.4 Agent computing (Huayan)

We will survey the agent paradigm of computing, focusing on rational agents, as

described in part 2 above. We will apply these concepts to the problem of machine

learning, as is done in much reinforcement-learning research.

We have defined an intelligent agent as a software entity that can monitor its

environment and act autonomously on behalf of a user or creator. To do this, an agent

must perceive relevant aspects of its environment, plan and carry out proper actions, and

communicate its knowledge to other agents and users. Learning agents will help us to

Page 10: MS Word

Learning agents midsemester 10/22/02 8

achieve these goals. Learning agents are adaptive, so that in difficult changing

environments they may change their behavior based on its previous experience.

The real problem with any intelligent agent system is the amount of trust placed in

the agent's ability to cope with the information provided by its sensors in its environment.

Sometimes the agent’s learning capability is not so good to achieve the anticipated goal.

This would be the emphasis when we study the agent.

Advantages of learning agents are their ability to adapt to environmental change, their

customizability, and their manageable flexibilities as the anticipated way. Disadvantages

are the time needed to learn/relearn, their ability only to automate preexisting patterns,

and thereby their lack of common sense.

3.5 Distributed computing (Jian)

In multi-agent learning in the strong sense, a common learning goal is pursued or, in

the weaker sense, agents pursue separate goals but share information. Distributed agents

may identify or execute distinct learning subtasks [weiss99]. We will survey the literature

on distributed computing, looking for connections to learning agents, and will apply what

we find in an attempt to build a distributed system of cooperating learning agents.

3.6 Implementation using Together, UML, and Java (Thibaut)

The maze described above could be represented as a bitmap or a two-dimensional

array of squares. Starting with a simple example is useful in order to concentrate on good

component design and successful implementation.

We used the Together CC software to reverse engineer existing code of examples of

learning agents. We used two examples, the cat and mouse example, and the dog and cat

Page 11: MS Word

Learning agents midsemester 10/22/02 9

example, explained below. We are using these examples to extract from the class

diagrams a possible design for our agents.

Multi-Agent systems being actors and software, their design do not follow typical

UML design. The paper [fla-gei01], by Flake and Geiger suggest that UML does not

offer the full possibility of designing these agents.

We plan on using the Together CC software to implement these agents, starting by

their UML design. We have so far identified several distinct components that we think

will be used in these learning agents.

These Java-implemented agents would then be executed through the JADE

environment. The communication component will have to be specific to the Agent

Communication Language (ACL) used in JADE. This should be the only environment-

dependent component. We will try to make the other components (learning, perception,

action) to be as “generic” as possible.

Besides the design and implementation of the agents, we also have to design the

environment (maze,…).

3.7 Extension to UML needed for multi-agent systems(Huayan)

Nowadays, Unified Modeling Language (UML) has been widely used in software

engineering. So it is easy to think of applying UML to the design of Agents Systems. But

Many UML applications are focused on macro aspect of agent systems like agent

interaction and communication, the design of micro aspects of such agents like goals,

complex strategies, knowledge, etc. has often been missed out. So the standard UML

could not afford to provide the complete solutions to multi-agent systems. A detail

Page 12: MS Word

Learning agents midsemester 10/22/02 10

description about how to use extended-UML to implement Multi-agents systems can be

seen from [fla-gei01]. A Dog-Cat Use-Case Diagram is given as follows:

Fig. 3.7.1 Dog-Cat Use-Case Diagram

In the above graph, agents are modeled as actors with square heads, and elements of

the environment are modeled as clouds. A goal case serves as a means of capturing high

level goals of an agent. Reaction cases are used to model how the environment directly

influences agents. An arc between an actor and a reactive use case expresses that the

actor is the source of events triggering this use case. Figure 1 illustrates Dog-Cat use-

case: the dog triggers the reactive use case DogDetected in the cat agent. In the

environment, the tree triggers the TreeDetected use case in the cat.

In the following, we will give a similar Use-Case Diagram of Cat-Mouse and Maze.

The rules of the Cat and Mouse game are: cat catches mouse and mouse escapes cat,

mouse catches cheese, and game is over when the cat catches the mouse.

The Cat-Mouse use-case Diagram is as follows:

Page 13: MS Word

Learning agents midsemester 10/22/02 11

Fig. 3.7.2 Cat-Mouse Use-Case Diagram

To the well-known maze problem, as we have mentioned in section3.2, we give the following use-case Diagram:

Fig. 3.7.3 The Maze Problem Use-Case Diagram

Page 14: MS Word

Learning agents midsemester 10/22/02 12

4. Progress on project, changes in direction and focus

We meet at least every Tuesday after class. Our main change of focus has been the

identification of an existing Q-learning package, “Cat and Mouse” (URL:

http://www.cse.unsw.edu.au/~aek/catmouse/followup.html), implemented in Java, and an

existing agent platform, JADE.

Thibaut generated a class diagram of the Cat and Mouse Java code using Together.

Jian installed the Java code into the JADE platform to create a distributed environment

for the learner. Our goal is to implement agents that learn to pursue moving or stationary

goals (cat pursues mouse, mouse pursues cheese) or avoid negative rewards (mouse flees

cat).

Huayan found a similar example, “Dog and Cat,” described with use cases, and

located other sources related to agent-based systems.

The source for “Dog and Cat” [Flake, Geiger, 2001], raised the issue of the

limitations of standard UML use-case diagramming for the purpose of depicting multi-

agent systems. Cat, for example, has the use case Escape while Dog has Chase. But these

two use cases denote the same set of events, seen from opposite perspectives.

David coded a simple maze reinforcement learner based on [RN95] in C++, writing

classes for the maxe, individual states in the maze, and the learning agent. At a later stage

this code could easily be ported to java.

David also wrote C++ code for a system based on (Michie, Chambers, 1968) that

uses reinforcement learning to solve the classic problem of pole balancing, in which a

controller nudges a cart that sits on a track, with a pole balanced on it, trying to avoid

letting the pole fall. In this problem, physical states are on a continuum in four

Page 15: MS Word

Learning agents midsemester 10/22/02 13

dimensions, but may be quantized into a tractable number of discrete states from the

standpoint of the learner, leading to a solution.

The two directions taken so far by group members are somewhat complementary.

The group may have to choose between them, however. Use of the existing Cat-and-

Mouse system will allow us with certainty to address harder problems, where the

learner’s environment changes in response to the agent (e.g., cat flees dog). Using JADE

has the best chance to allow us to attain our goal of implementing distributed learning

agents that communicate. We may then seek to extend the existing solution by adding to

its Java code.

The approach of coding known solutions from scratch, on the other hand, guarantees

that at least one member of the group will understand the code, and all members will

understand it if all members participate in the coding or if the code explains the code to

the others. We noticed that the Java code for Cat-and-Mouse is quite lengthy.

Page 16: MS Word

Learning agents midsemester 10/22/02 14

5. Planned activities

5.1 Oct. 23 – Oct. 29:

Consultation with instructor on platform and problem choices to be made; discussion

on selection of problem and platform. Decision on role in this project of UML extension

to multi-agent systems.

5.2 Oct. 30 – Nov. 5:

Java implementation of the learning aspect of the agents and enhancement of

communication efficiency. Each participant will code the components decided on and

described in the design part. Once these components are tested, they will be integrated

and the resulting system tested.

5.3 Nov. 6 – Nov. 12:

Extensions to code. Circulation of draft report.

5.4 Nov. 13 – Nov. 19:

Preliminary preparation of slides

5.5 Nov. 20 – Nov. 26:

Preparation of the final report and last adjustments of the learning agents.

5.6 Nov. 27 – Dec. 2:

Polishing of report and slides.

Page 17: MS Word

Learning agents midsemester 10/22/02 15

6. References

[aga-bek97] Arvin Agah and George A. Bekey. Phylogenetic and ontogenetic learning in a colony of interacting robots. Autonomous Robots 4, pp. 85-100, 1997.

[anders02] Chuck Anderson. Robust Reinforcement Learning with Static and Dynamic Stability. http://www.cs.colostate.edu/~anderson/res/rl/nsf2002.pdf, 2002.

[durfee99] Edmund H. Durfee. Distributed problem solving and planning. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999, pp. 121ff, 1999.

[d’Inverno01] Mark d’Inverno, Michael Luck. Understanding Agent Systems. [PUB?] 2001.

[fla-gei01] Stephan Flake, Christian Geiger, Jochen M. Kuster. Towards UML-based analysis and design of multi-agent systems. International NAISO Symposium on Information Science Innovations in Engineering of Natural and Artificial Intelligent Systems (ENAIS’2001), Dubai, March 2001.

[fra-gra96] Stan Franklin and Art Graesser. Is it an agent, or just a program?: A taxonomy for autonomous agents. Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages, 1996. www.msci.memphis.edu/ ~franklin/AgentProg.html

[huh-ste99] Michael N. Huhns and Larry M. Stephens. Multiagent systems and societies of agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 79-120, 1999.

[jac-byl] Ivar jacobson and Stefan Bylund. A multi-agent system assisting software developers. Downloaded.

[Knapik98] Michael Knapik, Jay Johnson. Developing Intelligent Agents for Distributed Systems, 1998

[lam-lyn90] Leslie Lamport and Nancy Lynch. Distributed computing: models and methods. In Jan van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. B, MIT Press, 1990, pp. 1158-1199.

[mitchelt97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997.

[mor-mii96] David E. Moriarty and Risto Miikkulainen. Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, pp. 11-33, 1996.

[petrie96] Charles J. Petrie. Agent-based engineering, the web, and intelligence. IEEE Expert, December 1996.

[rus-nor95] Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach. Prentice Hall, 1995.

[SAG97] Software Agents Group MIT Media Laboratory. “CHI97 Software Agents Tutorial”, htt p://pattie.www.media.mit.edu/people/pattie/CHI97/ .

Page 18: MS Word

Learning agents midsemester 10/22/02 16

[sandho99] Tuomas W. Sandholm. Distributed rational decision making. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 201-258, 1999.

[sen-wei99] Sandip Sen and Gerhard Weiss. Learning in multiagent systems. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 259-298, 1999.

[shen94] Wei-Min Shen. Autonomous learning from the environment. Computer Science Press, 1994.

[sut-bar98] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT Press, 1998.

[syc-pan96] Katia Sycara, Anandeep Pannu, Mike Williamson, Dajun Zeng, Keith Decker. Distributed intelligent agents. IEEE Expert, December 1996, pp. 36-45.

[venners97] Bill Venners. The architecture of aglets. Java World, April, 1997.

[wal-wya94] Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kenall. A note on distributed computing. Sun Microsystems technical report SMLI TR-94-29, November 1994.

[weiss99] Gerhard Weiss, Ed. Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999.

[wooldr99] Michael Wooldridge. Intelligent agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 27-77.

Reference to get title, author:

[xx99] http://www.cs.helsinki.fi/research/hallinto/TOIMINTARAPORTIT/1999/report99/node2.html.

Page 19: MS Word

Learning agents midsemester 10/22/02 17

Appendix A: Risks

Our objectives include avoiding several possible risks, including (1) the construction

of “toy worlds,” i.e., problem specifications tailored to the envisioned solution;

(2) complexity of design without performance gain; (3) overfitting the generalizable

components to the specific problem at hand, putting reusability at risk; (4) premature

commitment to a specific solution (Q-learning) as opposed to exploration of various

alternatives.

Appendix B: Categories of agent computing

A wide range of agent types exists.

Interface agents are computer programs that employ artificial intelligence

techniques to provide active assistance to a user with computer-based tasks.

Mobile agents are software processes capable of moving around networks

such as the World Wide Web, interacting with hosts, gathering information

on behalf of their owner and returning with requested information that is

found.

Co-operative agents can communicate with, and react to, other agents in a

multi-agent systems within a common environment. Such an agent's view of

its environment might be very narrow due to its limited sensory capacity. Co-

operation exists when the actions of an agent achieve not only the agent's

own goals, but also the goals of agents other than itself.

Reactive Agents do not possess internal symbolic models of their

environment. Instead, the reactive agent “reacts” to a stimulus or input that is

Page 20: MS Word

Learning agents midsemester 10/22/02 18

governed by some state or event in its environment. This environmental event

triggers a reaction or response from the agent.

The application field of agent computing includes economics, business (commercial

databases), management, telecommunications (network management) and e-societies (as

for e-commerce). Techniques from databases, statistics, and machine learning are widely

used in agent applications. In the telecommunication field, agent technology is used to

support efficient (in terms of both cost and performance) service provision to fixed and

mobile users in competitive telecommunications environments.

Appendix C: Q-learning Algorithm

With a known model (M below) of the learner’s transition probabilities given a state

and an action, the following constraint equation holds for Q-values, where a is an action,

i and j states, and R a reward:

Q(a, i) = R (i) + jMaij maxa Q(a, j)

Using the temporal-difference learning approach, which does not require a model,

we have the following update formula that is calculated after the learn goes from state i to

state j:

Q(a, i) Q (a, i) + (R(i) + maxa Q(a, j) - Q (a, i))

Within the objective of a simple implementation, we will aim to provide an analysis

of the time complexithy, adaptability to dynamic environments, and scalability of Q-

learning agents as compared to more primitive reinforcement learners.