mental models for intelligent agents nikolaos mavridis mit media lab

Mental Models for intelligent agents

Nikolaos MavridisMIT Media Lab

Early motivation…

How are people able to think about things that are not directly accessible to their senses at the moment?

What is required for a machine to able to talk about things that are out of sight, happened in the past, or view the world through somebody else’s eyes (and mind)?

What is the machinery required for the comprehension of a sentence like “Give me the green beanbag that was on my left”?

Overview

Why mental models? Architecture W: The descriptive language Reusable models Property description structures S: Sensory structures F: Instantiators / Predictors / reconcilliators Closing comments…

Why mental models ?

Goal: Provide an intermediate representation, mediating between perception and language

In essence: an internalized representation of the state of

the world as best known so far, in a form convenient for “hooking up” language

a way of updating this representation given further relevant sensory data, and predicting future states in the absence of such data

Why mental models?

But also: A useful decomposition of a complex problem, suggesting a

practical engineering methodology with reusable components, as well as a theoretical framework

A unified platform for the instantiation of hypothetical scenarios (useful for planning, instantiation of situations communicated through language etc.)

A starting point for experimental simulations of: Multi-agent systems with differing partial world knowledge or

model structure Primitive versions of theory of mind by incorporating the

estimated world models of other agents Learning parameters or structures of the architectures, and

experimenting with learned vs. innate (predesigned) tradeoffs (for example, learning predictive dynamics, senses-to-model maps, language-to-model maps etc.)

Notation & FormalitiesD = {W, S, F}

D = {W, S, F} : A dynamical mental model W: Mental Model State

W[t]: state of the mental model at time t W: the structure of the state (the chosen descriptive language for the

world, ontology). Decompositions might be hierarchical. W = {O1, O2, … } into Objects/Relations (creations/deletions crucial) Oi = {P1, P2, …} into Properties (updates of contents but usually no

creations/deletions) S: Sensory Input:

S[t], S, S = {I1, I2, …} (Modalities/Sensors) F: Update / Prediction function

W[t+1] = F( W[t], S[t] ) as a dynamical system F is a two-argument update / prediction function A decomposition: (…also Wh[t]: hypotheticals)

(W[t],S[t])->Ws[t] (sensory-driven changes in W form) W[t]->Wp[t] (prediction-driven changes in W form) W[t+1] = R(Ws[t], Wp[t]) (the “reconcilliation” function)

Block Diagram (&sync issues!)

MENTAL MODEL& RECONCILLIATOR(mental_model.exe)

W[t] and F

MODALITY-SPECIFIC

INSTATIATORS(visor.exe etc.)

(W[t],S[t])->Ws[t]

VIRTUALOBJECT

INSTANTIATOR(imaginer.exe)

(W[t],H[t])->Wh[t]

DYNAMICSPREDICTOR

(predictor.exe)Wp[t]

VISUALISER(visualiser.exe)

SENSESS[t]

HYPOTHESISGENERATION

VISUALFEATUREANALYSIS

LANGUAGEUNDERSTANDING

(bishop)viewpointselection

MENTAL MODELS: Ripley's case

Preliminary block diagram, Sept '03Nikolaos Mavridis, MIT Media Lab

W: the descriptive language

W in conversational setting: include me, you, others Indexing: Internal & External Ids, continuity, signatures Bottom-up:

Simple_object (f.e. a cylinder) Object_relation (binary) (f.e. hinge joint) Compound_object = SimpleObjectMap U ObjectRelationMap Agent = Compound Object U Viewpoint U Gripper U Mover? Agent_relation (f.e. inter-agent joints, visibilty?) Compound_agent = AgentMap U AgentRelationMap = World

Basic properties: in simple_object, object_relation Property description structures (fixed, with confidences,

with stochastic model, observational history, categorical form) - later

Simple_object

class simple_object: public Packable{ long ID; //OBSERVER-INDEPENDENT PROPERTIES //Gradations of existence int exists; //exists=1 means sensory object, exists=0 means virtual int body_exists; //for ODE newtonian dynamics int geom_exists; //for ODE collision-detection int draw_exists; //should it be visible in the visualiser?

//Position, rotation and velocity (second-order state space for rigid body) double pos[3]; double R[12]; //remeber to set quaternion, too! double lvel[3],avel[3]; double facc[3],tacc[3]; //force and torque accumulators...are they required?

//Shape int shape; #define SOBJECT_SHAPE_BOX, SOBJECT_SHAPE_SPHERE 2, SOBJECT_SHAPE_CYLINDER 3, SOBJECT_SHAPE_CAPPEDCYLINDER 4 double shapeparam[3]; //[0] is also radius, [1] is also length

//Mass etc. WHICH SHOULD BE CHOSEN AND WHICH DERIVED? double density; double mass; //this is just density*volume, i.e. density*f(shape) double weight; //should this also be here? just mass*gravity

//Color & texture double color[3]; int texture;

//Relations with other objects; attachment, visibility (inview_rip is OBSERVER-DEPENDENT in a sense...) //THESE MIGHT ALSO BE PART OF RIPLEY'S STRUCTURES int attached; int inview_rip; int inview_rip_x2D, inview_rip_y2D;}

Object_relation

class binary_object_relation : public Packable{

friend ostream& operator<<(ostream& os, binary_object_relation &bor);

public: long ID; long obj1ID; long obj2ID; //vector<double> params; //e, tha ginei pio specific

int type;#define BOR_TYPE_HINGE 1,define BOR_TYPE_HINGE2 2,define BOR_TYPE_BALL 3,define BOR_TYPE_SLIDER 4 double axis[3];#define BOR_DIRECTION_X 1, BOR_DIRECTION_Y 2, BOR_DIRECTION_Z 3 double anchor[3]; double param[10];#define BOR_PARAM_ANGLE 0#define BOR_PARAM_ANGLERATE 1#define BOR_PARAM_HISTOP 2#define BOR_PARAM_LOSTOP 3#define BOR_PARAM_VEL 4#define BOR_PARAM_FMAX 5#define BOR_PARAM_FUDGEFACTOR 6#define BOR_PARAM_BOUNCE 7#define BOR_PARAM_STOPERP 8#define BOR_PARAM_STOPCFM 9}

Compound_object

typedef map<long, simple_object> SimpleObjectMap;typedef map<long, binary_object_relation> BinaryObjectRelationMap;

static long compound_object_ID_counter=0; //Object 0 is not allowed!

class compound_object{ friend ostream& operator<<(ostream& os, compound_object &cobj); public: long ID; long signature; //signature of IDs of component objects and relations int exists; //existence flag.... //should we allow existence of subobjects even if globally it doesnt exist? SimpleObjectMap objects; long internal_object_ID_counter; BinaryObjectRelationMap relations; long internal_relation_ID_counter;

compound_object(); compound_object(long ID_); void clear(); void set_exists(); void set_notexists(); void clear_objects(); void clear_relations(); long add_object(simple_object &object_in); //returns new outerID long add_relation(binary_object_relation &relation_in); void add_objects(SimpleObjectMap &somap_in); void add_relations(BinaryObjectRelationMap &bormap_in); void add_objects_and_relations(compound_object &cobj_in); void add_SimpleObjectMap(SimpleObjectMap &somap_in); void add_BinaryObjectRelationMap(BinaryObjectRelationMap &bormap_in); void add_compound_object(compound_object &cobj_in); int delete_object_innerID(long ID_); … etc….

Agent, Compound Agent

class agent : public compound_object, public Packable{ public: viewpoint viewpt; gripper grip; //mover mov; public: void pack(int initsend=1); void unpack();};

typedef map<long, agent_ODE> AgentODEMap;typedef map<long, binary_agent_relation_ODE> BinaryAgentRelationODEMap;class compound_agent_ODE : public Packable{ long ID; long signature; //signature of IDs of component objects and relations int exists; //existence flag.... AgentODEMap agents; long internal_agent_ID_counter; BinaryAgentRelationODEMap relations; long internal_relation_ID_counter;+ member functions…

More on the structures…

myObjects: Packaging: A dozen of .h/.cpp made into libmyobjects.a

(+some utils), include “myobjects.h” Expand/rethink types of relations! Think about joint (and relation!) recognition (Mann…)

myModels: ready-made models for specific agents (ripley, human,

environment)… packaged in libmymodels… Expand!!! These include parameter sets for customization, creation

and deletion functions (as well as sensory update functions?). OuterIDs and body parts?

myObjectsODE: ODE-supplemented version for predictor

A model example:ripley_model.h

EASILY PARAMETRISABLE FOR OTHER n-dof ARMS…

#define SC 6. //Scaling factor, for some dimensions (rip etc.), not all!

const double ripley_start_pos[3] = {23.2/SC,0,5.2/SC};

//Simple objects comprising ripley#define NUM 6 //Number of links comprising ripleyconst double ripley_head_length = 5/SC;const double ripley_head_radius1 = 3.5/SC;const double ripley_head_radius2 = 2.5/SC;const double ripley_head_color[3] = {.5,.5,.5};const int ripley_head_texture = DS_WOOD;const double ripley_link_length[NUM]={2.4/SC, 22.8/SC, 9/SC, 1.8/SC, 2.3/SC, ripley_head_length /*0.1/SC*//*5.2/SC*/}; // last one -

length of cameraconst double ripley_color[3] = {.61,.61,.61};const int ripley_texture = DS_WOOD;const double ripley_link_radius[NUM]={2/SC, 2/SC, 1.5/SC, 1.5/SC, 1.1/SC, ripley_head_radius2}; Etc…

//*************//* FUNCTIONS *//*************void create_ripley_part_i(simple_object &part, int i, double* midpt, double* R_); //partsvoid create_ripley_bor_i(binary_object_relation &bor_in,

SimpleObjectMap &somap_in, int i, double* pos); //jointsvoid create_ripley(agent &agent_in, const double* pos); //Initial creation, called in mainvoid calc_ripley_viewpoint_hpr(agent &agent_in);void calc_ripley_viewpoint_hpr_nofilter(agent &agent_in);void calc_ripley_viewpoint_mat(agent &agent_in);void calc_ripley_viewpoint_nofilter(agent &agent_in);void calc_ripley_viewpoint(agent &agent_in);

Property description structures

The near future: Class property_conf

{string name; double value; double confidence;} How to deal with ints/doubles and vectors? How to update conf? Decrease with time?

4-tier structure Class property_4

{categorical_descr c; //variable granularity, context-sensitive boundaries property_conf ml; stoch_descr distrib; relevant_sensory_history senspointers;}

Advantages: Confidences vital for incomplete knowledge / information-driven sensing Homogenisation very useful for later experiments in feature selection etc.

Suggestions/ideas?

S: the sensory structures

Vision: Objectworld from 2D Objecter Extensions for 3D? Shape models? Partial view

integration and the instantiator? Proprioception:

JointAnglesPacket form ripley’s control Weight measurements Direct access to force feedback?

Switching from continuous to on-demand feeding of new information (I.e. lookat() etc.)

F: update/prediction function

I. Instantiators Modality-specific

instantiators(&updators/destruct.): Send create/update/delete packets to mental_model They SHOULD know previous world state R they modality or agent-specific? Should the generic agent models include specific

sensory update functions? Virtual object instantiator:

Sometimes also used for creation of sensory-updated agents (I.e. self) – boundaries?

What would the clients need? Let’s choose an API

F: update/prediction function

II. Predictor & Reconcilliator Prediction rules:

Collision detection (collisions as obj_relat) Dynamics (reconcilliation with senses, inference of

internal forces… where to store?) Out of bounds deletions & object stabilisation

Reconcilliation: How to resolve conflicts between sensed, predicted

and requested? (think: multiple sensors in car) Simplistic: When no other info, use prediction. Else,

blend senses with prediction?

Closing comments

Many open questions / lots of work in the horizon! How do you achieve localisation of information and actions in these modules? Who should know what and how should things be synced? What about global

signals sent from outside? Let’s design for easy customisation/reusability. Significant parallelisation achieved.

Some landmarks for the future: Confidences in property descriptions Virtual object instantiator connected to 3D world creation tool for simulated

external worlds Better shape description capabilities & vision Connection to a different robot Two virtual agents in simulated world each with its own mental model and the

estimate of the other’s – simple demos 4-tier property descriptors Hypothetical scenarios and planning

Parallel work: Extend linguistic modules for more functionality given the richness of the structures Given confidences, better shape and extended bishop, do action (and speech)

selection by maximum expected information return in general framework

Our ultimate goal…

Let’s make ripley and his brothers more fun to talk to!

And let’s learn more about us on the way…

THANX 4 yr attn!

mental models for intelligent agents nikolaos mavridis mit media lab

Documents

mental model statewt

datawhy mental models

overviewwhy mental models

model maps

estimated world models

stochastic model

dynamical mental modelw

chosen descriptive language