a decision-theoretic approach to designing proactive communication in multi-agent teamwork

A Decision-Theoretic Approach to A Decision-Theoretic Approach to Designing Proactive Communication in Designing Proactive Communication in

Multi-Agent TeamworkMulti-Agent Teamwork

Thomas R. Ioerger, Yu Zhang,

Richard Volz, John Yen (PSU-IST)

Dept. of Computer Science

Texas A&M University

2

MotivationMotivation

AgentMulti-Agent

Team Agents share a large amount of knowledge aboutthe teamwork.Hard coded Interactions amongparticipants.High-frequency message exchange.Communication risk.

3

Challenging Issues in Designing Challenging Issues in Designing Communication ProtocolsCommunication Protocols

Each agent has incomplete information from which uncertainties arise.

Each agent has different problem solving capabilities.

Data are decentralized and lack systems’ global control.

Excessive/unrestricted communication leads to lack of scalability

4

Our Approach and Its ContributionsOur Approach and Its Contributions

Proactive CommunicationOBPC: Reduction of communication load

through OBservations.

DIP: Dynamic estimation of the probability distribution of Information Production and need.

DTPC: Decision-Theoretic determination of communication strategies.

5

BackgroundBackground CAST (Collab. Agents for Simulating Teamwork) MALLET (Multi-Agent Logic-based Language for

Encoding Teamwork)

(team-plan killwumpus(?w) (process (seq (agent-bind ?ca (constraint (play-role ?ca scout))) (DO ?ca (findwumpus ?w))) (agent-bind ?fi (constraint ((play-role ?fi fighter)

(closest-to-wumpus ?fi ?w)))) (DO ?fi (movetowumpus ?w)) (DO ?fi (shootwumpus ?w))))))

(ioper shootwumpus (?w) (pre-cond (wumpus ?w) (location ?w ?x ?y) (dead ?w false)) (effect (dead ?w true)))

6

OverviewOverview

CASTCASTKB

KB

KB

KB

KB

KBProactive Communication

Proactive Communication

OBPCOBPC

DIP DIP DTPCDTPC

Optimal Communication Strategy

Team Structure & Teamwork Procedure

7

Agent Execution CycleAgent Execution Cycle

ObserveSense Predict

Info. need and production

DecideStrategyCommunicate

Information

ActEffect

ExecutionCycle

8

Syntax of ObservabilitySyntax of Observability

<observability> ::= (CanSee <viewing>)* (BelieveCanSee <believer><viewing>)*

<viewing> ::= <observer><observable> <cond><believer> ::= <agent><observer> ::= <agent><observable> ::= <property>|<action><cond> ::= (<property>)*<property> ::= (<property-name> <object> <args>)<action> ::= (DO <doer> (<operator-name> <args>))<object> ::= <agent>|<non-agent><doer> ::= <agent>

9

Example Observability RulesExample Observability Rules(CanSee ca (location ?o ?x ?y) (location ca ?xc ?yc) (location ?o ?x ?y) (inradius ?x ?y ?xc ?yc rca)) //The carrier can see the location property of an object.

(CanSee ca (DO ?fi (shootwumpus ?w)) (play-role fighter ?fi) (location ca ?xc ?yc) (location ?fi ?x ?y) (adjacent ?xc ?yc ?x ?y) ) //The carrier can see the shootwumpus action of a fighter.

(BelieveCanSee ca fi (location ?o ?x ?y) (location fi ?xi ?yi) (location ?o ?x ?y) (inradius ?x ?y ?xi ?yi rfi)) //The carrier believes the fighter is able to see the location property of an object.

(BelieveCanSee ca fi (DO ?f (shootwumpus ?w)) (play-role fighter ?f) ( ?f fi) (location ca ?xc ?yc) (location fi ?xi ?yi) (location ?f ?x ?

y) (inradius ?xi ?yi ?xc ?yc rca) (inradius ?x ?y ?xc ?yc rca) (adjacent ?x ?y ?xi ?yi)) //The carrier believes the fighter is able to see the shootwumpus action of another

fighter.

10

Proactive Communication Based Proactive Communication Based on Observationon Observation

ProactiveTell– A provider reasons about what information it will have.– A provider reasons about whether to deliver a piece of

information when having the information.

ActiveAsk– A needer reasons about what information it will need.

– A needer reasons about whether to ask for a piece of information when needing the information.

11

EvaluationEvaluation

20 wumpuses, 8 pits, and 20 piles of gold per world.

1 carrier and 3 fighters compose a team.

The team goal is to kill wumpuses and get the gold without being killed.

5 randomly generated worlds with 20×20 cells.

Multi-Agent Wumpus World

12

Decision-Theoretic Proactive Decision-Theoretic Proactive CommunicationCommunication

StrategiesUtility FunctionCost FunctionValue FunctionDecision-Making

13

Decision-Making on Situation PADecision-Making on Situation PA

0

1

2

e

ea-b: ProactiveTell

a-b: Silence

b-a: Accept

b-a: Wait

b-a: Silence

e

e

b-a: ActiveAsk

Situation PA: Provider produces a new piece of information

a: provider b: needer e: end

14

DM on Situation PBDM on Situation PB

0

a-b: Reply

ea-b: WaitUntilNext

Situation PB: Provider receives a request for a piece of information

e

15

DM on Situation NADM on Situation NA

b-a: ActiveAsk

b-a: Silence

b-a: Wait

a-b: Reply

a-b: WaitUntilNext

a-b: Silence

a-b: ProactiveTell

Situation NA: Needer needs a piece of information

01

0

t

t

e

e

e

t: transfer

16

DM on Situation NBDM on Situation NB

Situation NB: Needer receives a piece of information

t

0 eb-a: Accept

17

Utility FunctionUtility Function

Parameters in utility function:

– I: information about which communication occurs

– t: time of decision-making

– t1: time at which I is needed

– t2: time at which the value for I used is produced

– SU: situation at t

– S: strategy available at SU

– M: a set of messages involving in obtaining I

– E: environment state at t

U(I, t, t1, t2, SU, S, M, E)

=V(I, t, t1, t2, SU, S)–C(M)

18

Value FunctionValue Function

V(I, t, t1, t2, SU, S)

=T(I, t, t1, t2, SU, S)//Timeliness

+R(I, t, t1, t2, SU, S)//Relevance

19

Timeliness– Whether agents use a value that can be produced in

time when they need I.

d(I, t, t1, t2, SU, S) = max(0, t2–t1)

ft(d(I, t, t1, t2, SU, S))s.t. ft(x) < ft(y) if y < x

T(I, t, t1, t2, SU, S) = ft(d(I, t, t1, t2, SU, S))

Timeliness FunctionTimeliness Function

20

Relevance FunctionRelevance Function

Relevance– Unprocessed, Most recent, Important

P(I, t, t1, t2, SU, S) = Pr(I t t1 t2 no other value for I was produced

between Int[t1,t2] | S SU)

frI(P(I, t, t1, t2, SU, S))s.t. frI(x) < frI(y) if x < y

R(I, t, t1, t2, SU, S) = frI(P(I, t, t1, t2, SU, S))

21

Cost FunctionCost Function

0 if Mi=

C(Mi) =

k1 + k2 × len(Mi) otherwise

22

Expected UtilityExpected Utility

E(U) =

Time

Strategy

t1 t2

P.ProactiveTell

P.Silence +T

P.Reply

P.WaitUntilNext

N.ActiveAsk if a Reply

if a WaitUnitlNext

N.Silence

N.Wait if a ProactiveTell

+T if a Silence

N.Accept

ufNbT ,

0,PaTuf

NbT ,0,PaT

qbT ,1

,q

PaT

0,q

PaT

qbT ,

0,NbT

rbT ,

0,a

PaT1

,a

PaT0,NbT

0,NbT

1,a

PaT

gbT ,1

,gNbT

0,NbT

1s 2st t 2121r )t,U(t)t,t(P

23

StrategiesStrategies

xNbT ,

ufNbT ,

nsPaT ,

0,PaT

1,PaT

lsPaT ,

t

Current time

Unknown

Known

Next production

Last sent

Last notsent

Last need aware of

Unfulfilled need

Situation PA: Situation PA: provider produces I

ProactiveTell?Silence?

24


qbT ,

1,q

PaT

0,q

PaTt

Current time

Unknown

Known

Next production

Last production

Situation PB:Situation PB: provider receives a request for I

Reply? WaitUntilNext?

25


0,NbTrbT ,

0,a

PaT 1,a

PaT

t

Current time

Unknown

Known

Next production

Last I received

Most recentproduction

Situation NA: Situation NA: needer needs I

ActiveAsk?Wait?

Silence?

26

StrategiesStrategiesSituation NB: Situation NB: needer receives I

Accept

27

Summary• Advantages of Approach: allows agents to

make intelligent choices of communication policy based on:– frequencies: of needs, of sensing, of info. change– costs: of messages, plus penalities for delays in

action, or acting with incorrect information

28

CriteriaCriteria for for Applicable DomainsApplicable DomainsThere are information needs among the team.

Agents can communicate.

There is uncertainty in the environment. – Stochastic properties of teamwork process.– Agents have incomplete/disjoint knowledge

about the world.

The team acts under critical time constraints, so proactive assistance becomes important.

a decision-theoretic approach to designing proactive communication in multi-agent teamwork

Documents

w location

y location fi

y location ca

yc location fi

f fi location ca

believecansee ca fi

w agentbind

fi shootwumpus