ad hoc autonomous agent teams: collaboration without · pdf filead hoc autonomous agent teams:...

6

Click here to load reader

Upload: trinhnhan

Post on 10-Feb-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ad Hoc Autonomous Agent Teams: Collaboration without · PDF fileAd Hoc Autonomous Agent Teams: Collaboration without ... or \score" s(B;d), ... Ad Hoc Autonomous Agent Teams: Collaboration

Ad Hoc Autonomous Agent Teams: Collaboration withoutPre-Coordination

Peter StoneU. of Texas at [email protected]

Gal A. KaminkaBar-Ilan U.

[email protected]

Sarit KrausBar-Ilan U. & U. of Maryland

[email protected]

Jeffrey S. RosenscheinHebrew U.

[email protected]

AbstractAs autonomous agents proliferate in the real world,both in software and robotic settings, they will increas-ingly need to band together for cooperative activitieswith previously unfamiliar teammates. In such ad hocteam settings, team strategies cannot be developed apriori. Rather, an agent must be prepared to cooper-ate with many types of teammates: it must collaboratewithout pre-coordination. This paper challenges the AIcommunity to develop theory and to implement proto-types of ad hoc team agents. It defines the concept ofad hoc team agents, specifies an evaluation paradigm,and provides examples of possible theoretical and em-pirical approaches to challenge. The goal is to encour-age progress towards this ambitious, newly realistic,and increasingly important research goal.

1 IntroductionImagine that you are in a foreign country where youdo not speak the language, walking alone through apark. You see somebody fall off of his bicycle and injurehimself badly; there are a few other people in the area,and all of you rush to help the victim. There are severalthings that need to be done. Somebody should callan ambulance, someone should check that the victim isstill breathing, and someone should try to find a nearbydoctor or policeman. However, none of you know oneanother, and thus you do not know who has a mobilephone, who is trained in first aid, who can run fast, andso forth. Furthermore, not all of you speak the samelanguage. Nonetheless, it is essential that you quicklycoordinate towards your common goal of maximizingthe victim’s chances of timely treatment and survival.

This scenario is an example of what we call an ad hocteam setting. Multiple agents (in this case humans)with different knowledge and capabilities find them-selves in a situation such that their goals and utilitiesare perfectly aligned (effectively, everyone’s sole interestis to help the victim), yet they have had no prior op-portunity to coordinate. In addition to the emergencysetting described above, ad hoc teams may arise amongrobots or software agents that have been programmedby different groups and/or at different times such thatit was not known at development time that they wouldneed to coordinate. For example, rescue robots may be

Copyright c© 2010, American Association for Artificial In-telligence (www.aaai.org). All rights reserved.

brought to an earthquake site from different parts of theworld, or an e-commerce agent may need to coordinatewith other legacy agents that can no longer be altered.Note that in this latter example, the agents may needto coordinate repeatedly on the same or similar tasks.

In order to be a good “ad hoc team player,” an agentmust be adept at assessing the capabilities of otheragents, especially in relation to its own capabilities. Ifyou are trained in first aid, you may be the best personto examine the fallen bicyclist. But if one of the otherpeople is a doctor, you should take on a different role.Similarly, a good team player must also be adept atassessing the other agents’ knowledge states (does any-body know the right phone number?). Furthermore, itmust be proficient at estimating the effects of its ac-tions on the other agents. How will the others reactif you immediately run away, if you pull out a mobilephone, if you start screaming, or if you calmly measurethe victim’s pulse?

Ad Hoc Human TeamsThe concept of ad hoc human teams has arisen re-cently in military and industrial settings, especiallywith the rise of outsourcing. There have also been au-tonomous agents developed to help support human adhoc team formation (Just, Cornwell, & Huhns 2004;Kildare 2004). But ad hoc autonomous agent teamshave not been relevant in the past because autonomousagents, especially robots, have tended to be deployedonly for short times, and teams have been developedby cohesive development groups. As a result, it hastypically been possible (and usually necessary) to ad-just and tune the agents’ behaviors so that they interactwell with one another.

Ad Hoc Autonomous Agent TeamsAs the field progresses towards long-lasting autonomy,however, reasoning on the fly about interactions withother agents will become increasingly essential. Unlikemost team settings considered so far (e.g., (Stone &Veloso 1999; Rosenfeld et al. 2008)), agents in ad hocteam settings are not all programmed by the same peo-ple, and may not all have the same communication pro-tocols or world models. Furthermore, they are likely tohave heterogeneous sensing and acting capabilities thatmay not be fully known to one another. As a result,

Page 2: Ad Hoc Autonomous Agent Teams: Collaboration without · PDF fileAd Hoc Autonomous Agent Teams: Collaboration without ... or \score" s(B;d), ... Ad Hoc Autonomous Agent Teams: Collaboration

team strategies cannot be developed a priori. Rather,an agent that is to succeed in such an ad hoc team set-ting must be prepared to adjust its behavior to interactas well as possible with many types of teammates: thosewith which it can communicate and those with which itcannot; those that are more mobile and those that areless mobile; those with better sensing capabilities andthose with worse capabilities. A good team player’sbest actions are likely to differ significantly dependingon the characteristics of its teammates. The fact thathumans are routinely called upon to coordinate in anad hoc fashion strongly motivates the challenge of con-structing autonomous agents of similar flexibility.

The ChallengeOur challenge to the community identifies a specific,novel, high-risk, but high-payoff research area. Specif-ically, we call for theoretical treatments and concreteimplementations of robust, autonomous agents that aregood ad hoc team players.

That is, we challenge the community:

To create an autonomous agent that is ableto efficiently and robustly collaborate withpreviously unknown teammates on tasks towhich they are all individually capable ofcontributing as team members.

The remainder of this paper is organized as follows.First, Section 2 gives further insight into the details ofthe challenge via a specification of how potential solu-tions can be evaluated. Then, Sections 3 and 4 dis-cuss possible theoretical and empirical approaches tothe challenge, respectively. Section 5 expands on theways in which the challenge can be decomposed and/orthe scope of the challenge can be gradually increased,Section 6 touches on prior research most related to thechallenge, and Section 7 concludes.

2 EvaluationThough there is plenty of room for theoretical treat-ment of the problem (see Section 3), the challenge in-troduced in Section 1 is ultimately an empirical chal-lenge. Additionally, though it pertains to teamwork, itis fundamentally a challenge pertaining to building asingle autonomous agent. In this section, we shed fur-ther light on the intention of the challenge by specifyinga way in which potential solutions can be evaluated.

Let a0 and a1 be two ad hoc teammates whose perfor-mance is to be compared in domain D, from which tasksd can be sampled. For example, D may be a multiagentplanning domain with each task having different initialconditions and/or goals; or D may be robot soccer, witheach task being a match against a particular opponentteam. It may also be the case that there is only onetask in the domain that is repeatedly “sampled.”

Assume that there is some quantitative performancemeasure, or “score” s(B, d), that results when a set ofagents B performs task d once, such as time to goal

achievement, joint discounted reward over a fixed time,number of goals scored, or any other objective measure.Note that s may be a stochastic function.

Let there be a pool of potential teammates A ={a2, . . . , an}, each of which has some competency in do-main D. Specifically, we say that agent a ∈ A is compe-tent if there is some subset B ⊂ A such that a ∈ B, andthe team comprised of the agents in B is able to achievea minimal threshold expected performance, smin, on alltasks in D:1

∀a ∈ A,∀d ∈ D,∃B ⊂ A s.t. a ∈ B ∧ E[s(B, d)] ≥ smin

Note that it may be possible for an individual agent toachieve smin, for instance in a domain such as foragingwhere having teammates is helpful but not essential; orD may be a fundamentally multiagent domain in whichagent teams are needed to perform the task at all, suchas pushing heavy boxes. Note further that the agentsin A need not be themselves aware that they are actingas teammates; indeed when they are “team aware” theinteractions among the agents can become considerablymore complex.

We propose comparing agents a0 and a1 as potentialad hoc teammates of the agents in set A in domain Daccording to the following procedure.Evaluate(a0, a1, A, D)• Initialize performance (reward) counters r0 and r1 for

agents a0 and a1 respectively to r0 = r1 = 0.• Repeat:

– Sample a task d from D.– Randomly draw a subset of agents B, |B| ≥ 2, from

A such that E[s(B, d)] ≥ smin.– Randomly select one agent b ∈ B to remove from

the team to create the team B−.– increment r0 by s({a0} ∪B−, d)– increment r1 by s({a1} ∪B−, d)• If r0 > r1 then we conclude that a0 is a better ad-

hoc team player than a1 in domain D over the set ofpossible teammates A.

Note that there is a lot of potential variability bothin the breadth of the domain D (how different the taskscan be) and especially in the breadth of teammate ca-pabilities in A. We address these issues in more detailin Section 5. Note that we assume that agents a0 anda1 are aware of the domain D and the set of potentialteammates A. But A may have infinite cardinality, ineffect just placing bounds on teammate characteristics,such as the teammate will not be able to move fasterthan 2 m/s. In addition, even if A is finite, on eachiteration the set B− is initially unknown to the ad hocteam agents being evaluated.

This evaluation paradigm serves to emphasize thatan ad hoc team agent is fundamentally an individualagent. To be successful, it must perform well with anyset of teammates with which it is presented.

1For notational convenience, we assume that larger per-formance values indicate better task performance.

Page 3: Ad Hoc Autonomous Agent Teams: Collaboration without · PDF fileAd Hoc Autonomous Agent Teams: Collaboration without ... or \score" s(B;d), ... Ad Hoc Autonomous Agent Teams: Collaboration

3 Example Theoretical Approach

Although our challenge is ultimately empirical, thereis also ample room for theoretical analysis of compo-nents of the full problem. For example, aspects of adhoc teamwork can be usefully studied within the frame-work of game theory (Leyton-Brown & Shoham 2008).Specifically, a good ad hoc team agent should be ableto learn to interact with a previously unknown team-mate in a fully cooperative (common payoff) iterativenormal form game. If the teammate plays a fixed (possi-bly stochastic) strategy, the ad hoc team agent shouldsimply learn what that strategy is and play the bestresponse. But even in this simplest of scenarios, theproblem can become quite intricate to analyze if theteammate may itself be adaptive.

Collaborative Multi-Armed BanditsAs an initial theoretical analysis of an aspect of ad hocteamwork, Stone and Kraus (2010) consider a situationin which the ad hoc team player interacts repeatedly ina stochastic environment with a teammate that is bothless capable and less knowledgeable than itself. Specif-ically, the teammate can only execute a subset of theactions that the ad hoc team agent can execute, and,unlike the ad hoc team agent, it is unaware of the rel-ative utilities of the various actions. It is also unawarethat it is acting as a part of a team. They formalizethis situation as an instance of the well-studied k-armedbandit problem (Robbins 1952).

The basic setting of the k-armed bandit problem isas follows. At each time step, a learning agent selectsone of the k arms to pull. The arm returns a payoffaccording to a fixed, but generally unknown, distribu-tion. The agent’s goal is to maximize the sum of thepayoffs it receives over time. The setting is well-suitedfor studying exploration vs. exploitation: at any giventime, the agent could greedily select the arm that haspaid off the best so far, or it could select a differentarm in order to gather more information about its dis-tribution. Though k-armed bandits are often used forthis purpose, the authors were the first to consider amultiagent cooperative setting in which the agents havedifferent knowledge states and action capabilities.

In order to study the ad hoc team problem, the au-thors extend the standard setting to include two dis-tinct agents, known as the teacher and the learner, whoselect arms alternately, starting with the teacher. Theyinitially consider a bandit with just three arms suchthat the teacher is able to select from any of the threearms, while the learner is only able to select from amongthe two arms with the lower expected payoffs. The au-thors consider the fully cooperative case such that theteacher’s goal is to maximize the expected sum of thepayoffs received by the two agents over time (the teacheris risk neutral). Specifically, the authors make the fol-lowing assumptions:• The payoff distributions of all arms are fully known

to the teacher, but unknown to the learner.

• The learner can only select from among the two armswith the lower expected payoffs.

• The results of all actions are fully observable (to bothagents).

• The number of rounds (actions per agent) remainingis finite and known to the teacher.

• The learner’s behavior is fixed and known: it actsgreedily, always selecting the arm with the highestobserved sample average so far. If there are any pre-viously unseen arms, the learner selects one of themrandomly (optimistic initialization).

The teacher must then decide whether to do what isbest in the short term, namely pull the arm with thehighest expected payoff; or whether to increase the in-formation available to its teammate, the learner, bypulling a different arm. Note that if the teacher wereacting alone, trivially its optimal action would be toalways pull the arm with highest expected payoff.

By these assumptions, the learner is both less capableand less knowledgeable than the teacher, and it does notunderstand direct communication from the teacher. Itis tempting to think that we should begin by improvingthe learner. But in the ad hoc team setting, that is notan option. The learner “is what it is” either because itis a legacy agent, or because it has been programmedby others. Our task is to determine the teacher’s bestactions given such learner behavior.

This setting is only a limited representation of thefull ad hoc team setting from Section 1. However itretains the most essential property, namely that a sin-gle agent in our control must interact with a teammatewithout the advance coordination. Nonetheless, eventhis initial scenario presents some interesting mathe-matical challenges. Specifically, Stone and Kraus proveseveral theoretical results pertaining to which arms theteacher should consider pulling, and under what con-ditions (including for the natural generalization withmore than three arms). Furthermore, when the payoffsfrom the arms are discrete, they present a polynomialalgorithm for the teacher to find the optimal arm topull. (Stone & Kraus 2010).

The study of collaborative k-armed bandits describedin this section serves as a starting point for the theoret-ical analysis of ad hoc teams. However it leaves openmany directions for extensions: situations in which theteacher does not have full knowledge, the learner is“team aware,” the number of iterations is not known,and/or the learner’s behavior is not known a priori tothe teacher, among others. We hope that this ad hocteamwork challenge will inspire many new and chal-lenging problems in the theoretical analysis of optimalcontrol and optimal teamwork.

4 Example Empirical ApproachWhile theoretical analyses are likely to be able to iden-tify some situations in which optimal decision-makingis possible, they are also likely to identify aspects of adhoc teamwork that are not tractably solvable. There

Page 4: Ad Hoc Autonomous Agent Teams: Collaboration without · PDF fileAd Hoc Autonomous Agent Teams: Collaboration without ... or \score" s(B;d), ... Ad Hoc Autonomous Agent Teams: Collaboration

are also likely to be aspects of ad hoc teamwork thatare not easily analyzable at all. In such aspects of theproblem, there will be plenty of room for useful empir-ical analyses. In this section, we illustrate a possibleempirical approach using the domain of robot soccer.

Human SoccerHuman soccer is a perfect example of the ad hoc teamsetting. When given the opportunity, teams of humansoccer players train together and hone their interactionsso as to refine their ability to cooperate with one an-other. However individual soccer players are also ableto join “pick-up” games with players that they’ve nevermet before, let alone played with. They can even doso in foreign countries where they don’t know the lan-guage, leaving no way to communicate with their ad hocteammates other than mutual observation and experi-mentation. A talented human player is able to makequick judgments about how she will best fit into herad hoc team. When playing with worse players, she islikely to play in the center of the field; when playingwith better players she may look for a supporting rolethat limits responsibility. Furthermore, at the begin-ning of the game, it may be useful in this situation totake actions that highlight one’s particular strengths,such as kicking with the left foot, or passing the ballprecisely, so as to teach one’s teammates how best toincorporate the newcomer into the team.

Robot SoccerSimilarly, robot soccer teams at the internationalRoboCup competitions are typically developed as cohe-sive units with communication protocols and sophisti-cated methods for distributing the players on the fieldinto complementary positions (Stone & Veloso 1999).However, it is also possible to consider a “pick-up” gamein which the players are not able to pre-coordinate.

As an instantiation of the bandit example in Sec-tion 3, one could consider a center midfielder, who canpass to any of three forwards, teaching an outside mid-fielder, who can only pass to two of them. If the outsidemidfielder is new to the team and still learning aboutthe forwards’ capabilities, we find ourselves in an in-stantiation of exactly the abstract k-armed bandit sce-nario. Just as in the abstract setting, it is naturally ex-tensible to model sophisticated learners, partial teacherknowledge, unknown learner behavior, and so on.

Eventually, one can imagine staging field tests of adhoc team agents at the annual RoboCup competitions.Several participants with robots of varying strengthscould be invited to participate in a robot soccer “pick-up game.” That is, the robots would be placed on thefield as a team without any prior coordination amongthe human programmers.

A successful ad hoc team player will be able to quicklyevaluate whether it is playing with forwards or defend-ers, whether it is playing with more skillful players orwith less skillful players, etc., and adjust its play ac-cordingly. If it is placed on a team with no goalie, then

it should notice and adopt that role; if it is placed ona team with worse players, it should actively go to theball more often, and so on.

The essential aspect is that the ad hoc team playershould be able to deal with whatever teammates itmight come across, and without any foreknowledge ofthe teammates’ actual behaviors on the part of theagents or the programmers.

5 Controlling the Scope: Task andTeammate Breadth

An ad hoc team player must be prepared to collaboratewith all types of teammates. Thus one possible viewof the process of creating a fully capable ad hoc teamplayer is that it is akin to equipping it with a toolbox,each tool being useful for interacting with a class ofpossible teammates, as well as with a method for iden-tifying to which class the current teammates belong.

From this perspective, in order to create an ad hocteam player, one will need to address three high-leveltechnical challenges.1. Identify the full range of possible teamwork situa-

tions that a complete ad hoc team player needs to becapable of addressing.

2. For each such situation, find theoretically optimaland/or empirically effective algorithms for behavior.

3. Develop methods for identifying and classifyingwhich type of teamwork situation the agent is cur-rently in, in an online fashion.

Challenges 2 and 3 are the core technical aspects of thechallenge. But the first can also be seen as a sort ofknob, which can be used to incrementally increase thedifficulty of the challenge.

For this purpose, we start from the literature on adhoc human team formation (Kildare 2004) to organizeteamwork situations along three dimensions:Teammate characteristics: features of the individ-

ual teammates such as action capabilities, sensing ca-pabilities, decision making and learning capabilities,whether they can communicate directly, and priorknowledge.

Team characteristics: features of the collection ofteam members such as whether they are homoge-neous or heterogeneous, how many teammates are onthe team, and whether they can observe each other’sactions.

Task characteristics: features of the cooperativetask to be performed such as the goal, the time hori-zon, whether it is turn-taking, and how closely coor-dinated the agents need to be in their actions. Canthey divide the task at a high level and then act in-dependently, or do they need to coordinate low-levelactions?By initially limiting teammate, team, and task char-

acteristics (A and D from Section 2), we can renderthe challenge approachable even though the full-blownversion is quite ambitious. For example, Stone and

Page 5: Ad Hoc Autonomous Agent Teams: Collaboration without · PDF fileAd Hoc Autonomous Agent Teams: Collaboration without ... or \score" s(B;d), ... Ad Hoc Autonomous Agent Teams: Collaboration

Kraus’s k-armed bandit scenario summarized in Sec-tion 3 is appropriate for situations in which the team-mates have limited action capabilities, perfect sensing,greedy decision making, no direct communication, andprior knowledge limited to their own observations; theteam is heterogeneous, consists of two agents, and canfully observe each other’s actions; and the task ’s goalis to maximize the sum of discrete action utilities overa finite horizon where the agents act individually in aturn-taking fashion. In that work, the authors foundthe theoretically optimal action for the ad hoc teamplayer, thus taking a first step towards research chal-lenge 2.

We expect that the initial responses to this chal-lenge will address subproblems by similarly limiting thescopes of A and D. Indeed, most of the examples givenin this paper consider sets A such that the potentialteammates are not even necessarily aware that they area part of an ad hoc team; this case is the simplest toconsider. However, it will be an important aspect of thechallenge to consider sets A that include other agentsthat are aware that they are on an ad hoc team. Even-tually, there will be opportunities to generalize and/orcombine these subproblems into a more complete adhoc team agent.

6 Discussion and Related WorkThis challenge is predicated on the assumption thatsoftware agents and robots will soon be able to be de-ployed for extended periods of time in the real world.That is, their usefulness will outlive our ability to eas-ily change their behaviors. Such a phenomenon has al-ready occurred with conventional computer programs,as became apparent when the world was worried aboutthe “Y2K bug.” COBOL programmers were called outof retirement to try to reprogram computers that wereessential to business processes and vital infrastructure,but that were black boxes to everyone who used them.

Were autonomous agents to become similarly long-lived, it would be a huge landmark in their robustnessand reliability. However it would also expose us to theproblem that this challenge addresses. Namely, a newagent may need to collaborate with older agents whosebehavior is already fixed and not easily changeable.

Ad hoc teams are also already needed for environ-ments where agents with diverse capabilities and nocommon framework must quickly work as a team. Aspresented in Section 1, one example is when robots fromdifferent developers come together on a common rescuemission. A second example arises when software agents,programmed in isolation, must act within a team set-ting. These agents might need to analyze schedulingdata from different people to help coordinate meetingson their behalf, or they might need to coordinate withlegacy agents that can no longer be altered.

The main focus of this research challenge is ad hocteams in which teammates need to work together with-out any prior coordination. This perspective is atodds with most prior treatments of teamwork, such as

SharedPlans (Grosz & Kraus 1996), STEAM (Tambe1997), and GPGP (Decker & Lesser 1995) which de-fine explicit coordination protocols, languages, and/orshared assumptions about which the agents are mutu-ally aware. In applications such as the annual RoboCuprobot soccer competitions, entire teams of agents aredesigned in unison, enabling explicit pre-coordinationvia structures such as “locker room agreements” (Stone& Veloso 1999).

Other than the multi-armed bandit work previouslydescribed, the work that we are aware of that takesa perspective most similar to ad hoc teams is thatof Brafman and Tennenholtz (Brafman & Tennenholtz1996) in which they consider a teacher agent and alearner agent repeatedly engaging in a joint activity.While the learner has no prior knowledge of this ac-tivity, the teacher understands its dynamics. Howeverthey mainly consider a situation in which teaching isnot costly: the goal of their teacher is to maximize thenumber of times that the learner chooses the “right”action. Thus in some sense, the teacher is not “embed-ded” in the environment as a real teammate.

Although the complete challenge put forth in this pa-per is very ambitious and likely to take many years tomeet in a fully satisfactory way, there are numerous ex-isting techniques that may be useful starting points forcertain aspects of the challenge (e.g., for certain proper-ties of A and D). The remainder of this section providesa small sampling of such existing techniques.• As mentioned in Section 3, game theory (Leyton-

Brown & Shoham 2008) provides a useful theoret-ical foundation for multiagent interaction. Thoughoriginally intended as a model for human encounters(or those of human institutions), it has become muchmore broadly applied over the last several decades.

• A good ad hoc team player may need to make anexplicit assumption that its teammates are observ-ing and reacting to its actions (that they are “teamaware”). In doing so, the agent is actually plan-ning its actions intending for them to be observedand interpreted. Intended plan recognition (in con-trast to keyhole recognition) is the term used whenthe observed agent knows that it is being observed,and is acting under the constraints imposed by thisknowledge (Carrbery 2001). Much of the work onplanning for intended recognition settings has focusedon natural language dialogue systems (Sidner 1985;Lochbaum 1991).

• An important aspect of the ad hoc team challenge isrecognizing, or “modeling” the capabilities of one’steammates. For this purpose, work from the oppo-nent modeling literature (e.g., (Carmel & Markovitch1995; Stone, Riley, & Veloso 2000; Oshrat, Lin, &Kraus 2009)) may be readily adaptable to similarlymodel teammates.In addition, a good ad hoc team agent must recognizethe possibility that, while it is attempting to modelits teammates, the teammates may be simultaneously

Page 6: Ad Hoc Autonomous Agent Teams: Collaboration without · PDF fileAd Hoc Autonomous Agent Teams: Collaboration without ... or \score" s(B;d), ... Ad Hoc Autonomous Agent Teams: Collaboration

modeling it. In such a case, the agent is engaged ina recursive modeling setting (Vidal & Durfee 1995).

• Claus and Boutilier (Claus & Boutilier 1998) showhow reinforcement learning can provide a robustmethod for agents to learn how to coordinate theiractions. This work is one of many approaches for co-operative multiagent learning (see surveys at (Stone& Veloso 2000; Panait & Luke 2005)).In addition to existing AI methods, as mentioned in

Section 1, previous work has examined the use of agentsto support the formation of human ad hoc teams (Just,Cornwell, & Huhns 2004; Kildare 2004). This workrelies on an analysis of the sources of team variabil-ity, including member characteristics, team character-istics, and task characteristics (Kildare 2004), whichwe borrow as a structure for classifying types of au-tonomous teammates in Section 5. In addition, soft-ware agents have been used to support the operationof human teams (Chalupsky et al. 2001), and for dis-tributed information gathering from distinct, otherwiseindependent information sources (Sycara et al. 1996).But we are not aware of any such work that enables anautonomous agent to itself act as an ad hoc teammatewith previously unknown teammates.

7 Conclusion

This paper presents the concept of ad hoc autonomousagent teams and challenges the community to developnovel theoretical and empirical approaches to creatingeffective ad hoc teammates. Though today most agentteams are developed as a unit, we believe that it willnot be long before autonomous agents, both in softwareand robotic settings, will very often need to band to-gether on the fly (possibly with humans on their teamsas well!). We hope that this challenge will encourage theresearch necessary to ensure that agents will be able todo so effectively when that time arrives.

AcknowledgmentsThanks to the UT Austin Learning Agents Research Groupfor useful comments and suggestions. This work was par-tially supported by grants from NSF (IIS-0917122, IIS-0705587), DARPA (FA8650-08-C-7812), ONR (N00014-09-1-0658), FHWA (DTFH61-07-H-00030), Army ResearchLab (W911NF-08-1-0144), ISF (1357/07, 898/05), and theFulbright and Guggenheim Foundations.

ReferencesBrafman, R. I., and Tennenholtz, M. 1996. On partiallycontrolled multi-agent systems. JAIR 4:477–507.

Carmel, D., and Markovitch, S. 1995. Opponent modelingin a multi-agent system. In Sen, S., ed., IJCAI-95 Work-shop on Adaptation and Learning in Multiagent Systems,8–13.

Carrbery, S. 2001. Techniques for plan recognition. UserModeling and User-Adapted Interaction 11:31–48.

Chalupsky, H.; Gil, Y.; Knoblock, C.; Lerman, K.; Oh,J.; Pynadath, D.; Russ, T.; and Tambe, M. 2001. Elec-

tric elves: Applying agent technology to support humanorganizations. In IAAI-01.

Claus, C., and Boutilier, C. 1998. The dynamics of rein-forcement learning in cooperative multiagent systems. InAAAI-98, 746–752. AAAI Press.

Decker, K. S., and Lesser, V. R. 1995. Designing a familyof coordination algorithms. In ICMAS-95, 73–80. MenloPark, California: AAAI Press.

Grosz, B. J., and Kraus, S. 1996. Collaborative plans forcomplex group actions. AIJ 86:269–358.

Just, J.; Cornwell, M.; and Huhns, M. 2004. Agentsfor establishing ad hoc cross-organizational teams. InIEEE/WIC/ACM International Conference on IntelligentAgent Technology, 526–30.

Kildare, R. 2004. Ad-hoc online teams as complex systems:agents that cater for team interaction rules. In Proceedingsof the 7th Asia-Pacific Conference on Complex Systems.

Leyton-Brown, K., and Shoham, Y. 2008. Essentials ofGame Theory: A Concise, Multidisciplinary Introduction.Synthesis Lectures on Artificial Intelligence and MachineLearning. Morgan and Claypool Publishers.

Lochbaum, K. E. 1991. An algorithm for plan recognitionin collaborative discourse. In ACL, 33–38.

Oshrat, Y.; Lin, R.; and Kraus, S. 2009. Facing the chal-lenge of human-agent negotiations via effective general op-ponent modeling. In AAMAS-09, 377–384.

Panait, L., and Luke, S. 2005. Cooperative multi-agentlearning: The state of the art. AAMAS-05 11:387–434.

Robbins, H. 1952. Some aspects of the sequential designof experiments. Bulletin American Mathematical Society55:527–535.

Rosenfeld, A.; Kaminka, G. A.; Kraus, S.; and Shehory, O.2008. A study of mechanisms for improving robotic groupperformance. AIJ 172(6-7):633–655.

Sidner, C. L. 1985. Plan parsing for intended responserecognition in discourse. Computational Intelligence 1(1).

Stone, P., and Kraus, S. 2010. To teach or not to teach?decision making under uncertainty in ad hoc teams. InAAMAS-10. International Foundation for AutonomousAgents and Multiagent Systems.

Stone, P., and Veloso, M. 1999. Task decomposition, dy-namic role assignment, and low-bandwidth communicationfor real-time strategic teamwork. AIJ 110(2):241–273.

Stone, P., and Veloso, M. 2000. Multiagent systems: Asurvey from a machine learning perspective. AutonomousRobots 8(3):345–383.

Stone, P.; Riley, P.; and Veloso, M. 2000. Defining andusing ideal teammate and opponent models. In IAAI-00.

Sycara, K.; Decker, K.; Pannu, A.; Williamson, M.; andZeng., D. 1996. Distributed intelligent agents. IEEE Expert11(6).

Tambe, M. 1997. Towards flexible teamwork. JAIR 7:81–124.

Vidal, J. M., and Durfee, E. H. 1995. Recursive agentmodeling using limited rationality. In ICMAS-95, 125–132.AAAI/MIT press.