emotion as an integrative process between non-symbolic and …€¦ · emotion as an integrative...

Emotion as an integrative process between non-symbolic andsymbolic systems in intelligent agents

Ruth AylettMACS, Heriot-Watt University

Riccarton, Edinburgh EH10 [email protected]

Abstract

This paper briefly considers the story so far in AI on agent control architectures and the laterequivalent debate between symbolic and situated cognition in cognitive science. It argues againstthe adoption of a reductionist position on symbolically-represented cognition but in favour of an ac-count consistent with embodiment. Emotion is put forward as a possible integrative mechanism viaits role in the management of interaction between processes and a number of views of emotion areconsidered. A sketch of how this interaction might be modelled is discussed.

1. Embodied cognition

Within AI, the problem of relating cognition andaction has led to a well-known division of opinionsince about the mid 1980s between the older sym-bolic computing position that classically saw actionas a one-many decomposition of abstract plannedactions from a symbolically-represented controllevel and the situated agent view, as in Brooks(1986), that saw action as a tight stimulus-responsecoupling that did not require any symbolic repre-sentation. This can be – and originally was – posedas an engineering question of how to produce sys-tems that were able to act competently in the realworld, hence the origin of the argument in robotics,where the real world cannot be wished away andwhere robot sensing systems do not deliver symbols.

At this engineering level, the 1990s saw a prag-matic reconciliation of these divergent positions inhybrid architectures, usually with three levels (Gat97, Arkin and Balch 97, Barnes et al 97), in whichthe relationship between symbolic planning systemsand non-symbolic reactive systems was resolved bygiving the reactive systems ultimate control of exe-cution but either allowing planning to be invoked asa resource when needed (for example as a conflictresolution mechanism, or to provide sequencingcapabilities) or giving planning systems to ability toconstrain and contextualise – but not determine -the reactive system in what is sometimes known assupervisory control.

However the argument that was carried on froman engineering perspective in robotics, and which to

some extent is now a done deal, has subsequentlycontinued at a more scientific level in cognitive sci-ence, a discipline within psychology that arguablyowed its existence to ideas from classical AI andwas heavily influenced by the symbolic world-viewas seen in large-scale computational cognitive mod-els such as SOAR (Rosenbloom et al 93) and ACT-R (Anderson et al 04). Just as in pre-Brooksian ro-botics, these models can be criticised for not pro-viding any adequate account of the role of sensingor motor action, which are implicitly seen as periph-eral to a cognitive model much as I-O capabilitiesare peripheral to a computer processor.

A deeper criticism arises from a view of agencywhich sees embodiment as a key starting point ratherthan an optional extra (Clark 98) and starts from abody in a specific environment that needs a mind tocontrol it rather than a mind considering abstractproblems. In the world-view of embodied cognition(Wilson 02), sensori-motor engagement is theground from which cognitive abilities are con-structed (as in Piaget’s developmental psychology);thus neither cognitive abilities nor specific environ-ments and interactions can be detached in the waythat had been previously assumed, and a dynamicand process-based view supersedes a declarative andlogical-inferencing based view. The Cartesian sepa-ration between mind and body which still seems toexist in multi-level architectures is abandoned infavour of brain-body integration, in which processessuch as the endocrine system play a vital coordinat-ing role.

This does not mean however that a symbolic ac-count of cognition is a pointless activity either from

a scientific or an engineering perspective (so we donot actually have to abandon the whole of cognitivescience up to now as well as a large chunk of psy-chology). Just as computational neuro-physiologydoes not operate at the explanatory level of physics,there is no reason why cognition based on symbolicreference must be reduced to neuro-physiology,even though what is known about the way in whichthe brain works suggests that symbol manipulationis an emergent property of the dynamic systemformed by interaction between neurons. Indeed, thevery concept of emergence dictates that an emergentphenomenon is modelled at its own level of repre-sentation since it cannot be decomposed into anyone of the components whose interaction producesit. Thus an account at the symbolic level may be avalid one as long as it does not incorporate incorrectassumptions about the relationship between thisactivity and sensori-motor engagement.

The power of symbolic reference to abstract outof the current sensory context, to project into imag-ined contexts, to discretise continuous experiencesinto conceptual aggregates, to communicate throughlanguage and to use the environment to scaffoldengagement with the world and other humans (asthrough writing) has a substantial impact on bothcognition and interaction for humans. Thus currentwork on social agents that aims to produce morehuman-like systems whether via robots or graphicalcharacters must incorporate symbol-manipulationsystems as well as the sensor-driven systems thatallow them to act with some degree of competencein their physical or virtual world.

However an important characteristic of thesehuman abilities that has not been replicated in com-puter-based systems is that unification of symbol-manipulation abilities with non-symbolic behavioursdriven more directly by sensing that we observe inhuman activity. In the human case, the hypothesisthat symbol manipulation is an emergent property ofinteraction between brain components suggests thatthe ability to move smoothly between cognition andreactive engagement with the world is probably amatter of adjusting the interaction between thesecomponents and does not therefore require explicitconversion between multiple representations in theway this is typically carried out in current hybridagent architectures. Unfortunately the computationalneuro-physiology account of specific brain subsys-tems is still fragmentary, and there is no short-term(or even medium-term) likelihood of producing theprincipled interactional account that this hypothesisrequires. Arguably, solving the problem of emergentsymbolic reference would not only deal with theorigin of language but possibly also with conscious-ness. It is thus a non-trivial enterprise.

How then are we to proceed with an integratedaccount that is not merely a pragmatic engineering

kludge needed to produce competent social agents?The argument of this paper is that one should seeprocess regulation as the key to the enterprise sinceeven if the detailed mechanisms adopted may beinvalidated by more extensive models of the brain,the basic approach is likely to be compatible withsuch models. The specific hypothesis of this paper isthat affective systems should be considered as a keycomponent of such regulation because of their rolein attentional focus, in relation both to perceptionand cognition, as well as the management of goals,the allocation of internal resources, and the balancebetween thinking and acting.

2. Accounts of emotion

Just as accounts of action split into two camps in AIfrom the mid 1980s, there are two correspondingaccounts of emotion and its role with respect toagency, one more related to models of the brain andnervous system, and process-oriented, and one re-lated to symbolic models of cognition dealing withgoal management and inferencing.

The first of these views, which aims at neuro-physiologically plausibility, models emotion as partof a homeostatic control mechanism. Often incorpo-rating a model of the endocrine system (Canamero98) it suggests that emotion should be viewed as theset of brain-body changes resulting from the move-ment of the current active point of a brain-bodyprocess outside of an organism-specific ‘comfortzone’. It does not therefore require a single meter-like component in an agent architecture to representan emotion, but offers a distributed representationinterpretable in terms of the internal process statesand external expressive behaviour as an emotion.

As well as an independent system state, one canalso regard emotion in this framework as modifyingthe impact of an incoming stimulus. Thus emotioncan be incorporated into action selection both indi-rectly and directly in much the same way as percep-tion, and indeed can be thought of as functioningrather like an internal sensing process concernedwith all the other running processes.

The second view of emotion is usually known asappraisal theory since it assumes a cognitive ap-praisal associated with perception that assesses therelationship between symbolic categories estab-lished via a model hierarchy using perceptual inputand the cognitive-level goals of the agent.

Specific appraisal-based theories that haveproved highly influential in the construction ofgraphical characters are those of Ortony et al (1988),which was based on a taxonomy of 22 emotiontypes, each with an associated appraisal rule, andthat of Lazarus (1984), which links appraisal to ac-tion via the concept of coping behaviour. Sherer

(01) decomposes appraisal into a sequence made upof Relevance Detection, Implication Assessment,Coping Potential Determination and Normative Sig-nificance Evaluation, but remains tied to a top-downview of emotion in which cognitive processing re-sults in later physiological changes.

Interestingly however one can interpret appraisalas an abstraction on a process of the same type asthe first view in which a goal takes the place of acomfort zone. This does not mean that goals couldnever be independently determined at the cognitivelevel, but it offers the possibility of propagating thestate of the homeostatically-regulated non-symbolicsystems into the more abstract representationalspace of symbolic cognition.

In contrast to both of the views discussed above,Izard (1993) takes a heterogeneous approach whichdoes not rule out appraisal but argues that emotioncan also be generated directly by the nervous sys-tem, as in the first account, by empathy and byhighly intense states of physiological drives such ashunger or lust. Such an approach is consistent withthe integration between both views of emotion aspart of an integration between different types ofprocess within an agent.

3. A sketch of interaction

It is very tempting to view the integrative functionswe are seeking as a way of linking different levelswithin a multi-level hierarchy. Within robotics thisis exactly how these issues are discussed: symboliccognition is a high-level system while non-symbolicreactive systems are low-level. We have avoidedthis terminology so far because it is highly ambigu-ous in other fields. Within psychology and cognitivescience, high-level could also mean evolutionarilymore recent or conscious as distinct from sub-conscious.

However we have argued above that it doesmake sense to think of a representational level forsymbolic processing even if the way in which weimplement it in a computer is very different from theway it is implemented in the brain. Once time istaken into account it is also clear that the processeson which symbol manipulation depends run withfewer real time constraints in terms of deliveringmotor commands, are able conceptually to stay atwhat we would call a higher level of abstraction andas a result provide discrete categories covering whatat a non-symbolic level would be seen as dynamicprocesses.

It is however misleading to think of reflection asan activity that runs wholly as symbol manipulationand reaction as an activity that does not use symbolmanipulation at all. As Wilson (02) argues, reflec-tion is grounded in the mechanisms of sensory proc-

essing and motor control that evolved for interactionwith the environment even when it is being appliedfor purposes that do not require immediate activityin the specific environment of the current moment;what she describes as offline cognition. At the sametime, appraisal is an example of online cognitionwhich may be quite reactive.

In fact, emotion seems to be closely tied to thedistribution of internal resource between the proc-esses producing symbol manipulation and otherswith tighter connections to motor action, as witnessemotional flooding, in which cognitive activityseems to substantially shut down. We can think ofthis as a type of internal attentional focus which maybe more closely tied to the attentional focus pro-posed by perception for a tight loop with the currentenvironment or less tightly coupled to it when moreoffline cognition is taking place.

As a sketch of interaction, we finally consider apossible relationship from symbolic to non-symbolic(what might be called top-down processing if weadopt the levels vocabulary) and then from non-symbolic to symbolic.

3.1 From symbolic to non-symbolicAs mentioned in section 1 above, this is an issue thathas been relatively extensively discussed in robotics,though in practice, the difficulties of creating a robotthat is able to carry out more than a very narrowrepertoire of actions has made most implementa-tions either highly reactive or partially scripted.

Here, the issue is how to avoid the one-many ex-pansion of discrete categories – for example plannedactions – from symbolic form into non-symbolicform in a rigid mapping independent of sensing ca-pabilities as in the classic Shakey-like approach. Aview of reflection as a set of constraints on reactivesystems allows this deterministic mapping to beavoided and replaced by contextual activation ofgroups of reactive processes. Thus in Barnes et al(1997) a planner mapped planning operator pre-conditions into sensory pre-conditions that could bedetected by reactive processes and named the set ofprocesses to be activated or deactivated on such pre-conditions being perceived.

It has the advantage that it allows reflection tooverrule specific actions by an agent as well as toenable them. This supports a model of ‘double ap-praisal’ situations where for example hitting some-one who has been offensive is overruled because ofthe way it will make the agent look to other peoplein a social group.

This approach does not require the initiative forreflection to come from an external task – it isequally compatible with the invocation of planningwhen reactive systems need it, whether to take ad-

vantage of sequencing capabilities or to deal with asituation in which reactive systems are not suc-ceeding. However in this case it depends on the in-tegration in the opposite direction, which is muchmore problematic and much less discussed.

3.2 From non-symbolic to symbolicIf constraints allow symbolic systems to impact non-symbolic ones without wholly determining them,pattern recognition is an obvious mechanismthrough which symbolic systems can discretise thedynamic variation of non-symbolic systems. Inter-preting sub-symbolic configurations as either drivesor emotions allows them to act as motivationswithin the symbolic systems and thus to initiatesymbolic system activities such as planning.

This can be invoked from the symbolic process‘how do I feel?’ ‘what do I want to do?’, but clearlycan also, as just mentioned allow the non-symbolicprocesses to do the invoking, largely by associatinghigh-levels of emotion with specific motivations.Here we think of a motivation as distinguished fromgoals by temporal scope and generality – thus deal-ing with hunger by eating is a motivation, whilebuying sandwiches or looking in the fridge would beexamples of goals arising from this motivation.

Within the symbolic systems then, emotion canbe thought of as an integral part of goal manage-ment, and also as a heuristic weighting mechanismfor large search spaces creating a search-orientedattentional focus. Internal attentional focus is a fur-ther example of the use of constraints as a modellingdevice: in this case non-symbolic systems may con-strain what goals are considered by the symbolicsystems, the extent or type of memory retrieval thatcan be carried out, or the extent or type of actionsthat can be considered in planning. One could alsoallow the non-symbolic systems to exercise controlover the pattern-recognition mechanism required todeliver motivations to the symbolic systems so thata greater or lesser number of motivations are han-dled.

These are aspects of the regulation of resourcesbetween symbolic and non-symbolic processes, andgiven that emotion is also heavily involved in sen-sory-motor coupling in non-symbolic systems, avery high level of emotion may truncate the sym-bolic process search space to the point where cogni-tion almost halts, allowing us to model emotionalflooding.

4. What cognitive model?

This approach to integrating non-symbolic andsymbolic systems requires a model of a differenttype from ACT-R or SOAR, though these both in-

clude mechanisms which can be applied within thesymbolic systems. Meanwhile, neuro-physiologicalmodels of the brain are still too fragmentary andsmall-scale to be useful for this purpose.

One interesting and possibly more useful ap-proach is offered by the PSI model of Dorner (Dor-ner and Hille 95). This focuses on emotional modu-lation of perception, action-selection, planning andmemory access. Emotions are not defined as explicitstates but rather emerge from modulation of infor-mation processing and action selection. Thesemodulators include arousal level (speed of informa-tion processing), resolution level (carefulness andattentiveness of behavior) and selection threshold(how easy it is for another motive to take over), andthus provide the type of interface discussed in thelast section to non-symbolic systems. The modelalso applies two built-in motivators - level of com-petence and level of uncertainty, which are thoughtof as the degree of capability of coping with differ-ing perspectives and the degree of predictability ofthe environment.

It would be an overstatement to suggest that thismodel can be applied without alteration to the dis-cussion of this paper – not least because aspects arespecified too broadly for straightforward imple-mentation (Bach 2003) – but the role of emotion itouts forward does correspond in part to the sugges-tions made here.

AcknowledgementsThe EU Humaine Network of Excellence on affec-tive processing (http://emotion-research.net) pro-vided an environment within which extended dis-cussion has taken place on the topic of this paper:however it remains the view of the author not of thenetwork as a whole. The European Commissionpartly funds Humaine but has no responsibility forthe views advanced.

ReferencesAnderson, J.R; D. Bothell, M. D. Byrne, S. Douglas,

C. Lebiere, and Y. Qin. (2004) An integratedtheory of the mind. Psychological Review,111(4):1036–1060, 2004.

Arkin, R.A. and Balch,T. (1997) AuRA: Principlesand Practice in Review. Journal of Experi-mental and Theoretical AI, 9(2), 1997.

Bach, J (2003) The MicroPsi Agent Architecture.Proceedings of ICCM-5, International Confer-ence on Cognitive Modeling Bamberg, Ger-many (pp. 15-20), 2003

Barnes, D.P; Ghanea-Hercock, R.A; Aylett, R.S. &Coddington, A.M. (1997) “Many hands makelight work? An investigation into behaviour-ally controlled co-operant autonomous mobilerobots”. Proceedings, 1st InternationalConference on Autonomous Agents, Marinadel Rey, Feb 1997 pp413-20

Brooks, R.A.(1986) A robust layered control systemfor a mobile robot. IEEE Journal of Roboticsand Automation, RA-2(1):14–23, March 1986

Canamero, D. (1998) Modeling Motivations andEmotions as a Basis for Intelligent Behaviour,Proceedings of the First International Confer-ence on Autonomous Agents eds. Johnson, W.L. and Hayes-Roth, B. ACM Presspp148—155, 1998

Clark, A. (1998). Embodied, situated, and distrib-uted cognition. In W. Bechtel & G. Graham(Eds.), A companion to cognitive science (pp.506-517). Malden, MA: Blackwell.

Dorner, D., Hille, K.(1995): Artificial souls: Moti-vated emotional robots. In: Proceedings of theInternational Conference on Systems, Man andCybernetics. (1995) 3828– 3832

Gat E. (1997) On Three-Layer Architectures, inKortenkamp D., Bonasso R.P., Murphy R.(eds.): Artificial Intelligence and Mobile Ro-bots, MIT/AAAI Press, 1997.

Izard, C. E. (1993). Four systems for emotion acti-vation: Cognitive and noncognitive processes.Psychological Review, 100, 68-90

Lazarus, R.S& Folkman, S. (1984). Stress, appraisaland coping. New York: Springer

Ortony, A; Clore, G. L. and Collins, A. (1988) TheCognitive Structure of Emotions, CambridgeUniversity Press 1988

Rosenbloom P., Laird J., and Newell A. eds. (1993)The Soar Papers: Research on Integrated Intel-ligence. MIT Press, Cambridge, Massachu-setts, 1993.

Scherer, K. R. (2001). Appraisal considered as aprocess of multi-level sequential checking. InK. R. Scherer, A. Schorr, & T. Johnstone(Eds.), Appraisal processes in emotion. NewYork: Oxford University Press.

Velasquez, J.D. (1997) Modeling Emotions andOther Motivations in Synthetic Agents. Pro-ceedings, 14th National Conference on AI,AAAI Press, pp10-16

Wilson, M. (2002) Six views of embodied cognitionPsychonomic Bulletin & Review 2002, 9 (4),625-636

Embodiment vs.Memetics:Is Building a Human getting Easier?

Joanna Bryson??Artificial models of natural Intelligence

University of Bath, United [email protected]

Abstract

This heretical article suggests that while embodiment was key to evolving human culture, and clearlyaffects our thinking and word choice now (as do many things in our environment), our culture mayhave evolved to such a point that a purely memetic AI beast could pass the Turing test. Though makingsomething just like a human would clearly require both embodimentandmemetics, if we were forcedto choose one or the other, memetics might actually be easier. This short paper argues this point, anddiscusses what it would take to move beyond current semantic priming results to a human-like agent.

1 Embodiment

There is no doubt that embodiment is a key part of hu-man and animal intelligence. Many of the behavioursattributed to intelligence are in fact a simple phys-ical consequence of an animal’s skeletal and mus-cular constraints (Port and van Gelder, 1995; Paul,2004). Taking a learning or planning perspective,the body can be considered as bias, constraint or (inBayesian terms) a prior for both perception and ac-tion which facilitates an animal’s search for appropri-ate behaviour (Bryson, 2001).

This influence continues, arguably through allstages of reasoning (Chrisley and Ziemke, 2002;Lakoff and Johnson, 1999) but certainly at leastsometimes to the level of semantics. For example,Glenberg and Kaschak (2002) have demonstrated theaction-sentence compatibility effect. That is, subjectstake longer to signal comprehension of a sentencewith a gesture in the opposite direction as the motionindicated in the sentence than if the motion and sen-tence are compatible. For example, given a joystickto signal an understanding of ‘open the drawer’, itis easier to signal comprehension by pulling the joy-stick towards you than pushing it away. Boroditskyand Ramscar (2002) have shown that comprehensionof ambiguous temporal events are strongly influencedby the hearer’s physical situation with respect to cur-rent or imagined tasks and journeys.

These sorts of advances have lead some to suggestthat the reason for the to-date rather unimpressivestate of natural language comprehension and produc-

tion in Artificially Intelligent (AI) systems is a con-sequence of their lack of embodiment (Harnad, 1990;Brooks and Stein, 2004; Roy and Reiter, 2005). Thesuggestion is that, in order to be meaningful, conceptsmust be grounded in the elements of intelligence thatproduce either action or perception salient to action.

The pursuit of embodied AI has lead us to under-stand resource-bounded reasoning which explains ap-parently suboptimal or inconsistent decision-makingin humans (Chapman, 1987). It has also helped usto understand the extent to which agents can relyon the external world as a resource for cognition— that perception can replace or at least supple-ment long-term memory, reasoning and model build-ing (Brooks, 1991; Clark, 1997; Ballard et al., 1997;Clark and Chalmers, 1998). However, despite im-pressive advances in the state of artificial embodiment(e.g. Chernova and Veloso, 2004; Schaal et al., 2003;Kortenkamp et al., 1998), there have been no clearexamples of artificial natural language systems im-proved by embodiment.

I believe this is because embodiment, while nec-essary, is not a sufficient explanation of semantics.Wehaveseen neat examples of the embodied acquisi-tion of limited semantic systems (e.g Steels and Vogt,1997; Steels and Kaplan, 1999; Roy, 1999; Billardand Dautenhahn, 2000; Sidnera et al., 2005). Thesesystems show not only that semantics can be estab-lished between embodied agents, but also the rela-tion between the developed lexicon and the agents’physical plants and perception. However, such ex-amples give us little idea of how words likeINFIN-

ITY , SOCIAL or REPRESENTmight be represented.Further, they do not show thenecessityof physicalembodiment for a human-like level of comprehensionof natural language semantics. On the other hand, itis possible that the semantic system underlying ab-stract words such as ‘justice’ may also be sufficientfor terms originally referencing physical reality.

I do not contest the importance of understandingembodiment to understanding human intelligence asa whole. Ido contest one of the prominent claimsof the embodied intelligence movement — that em-bodiment is the only means of grounding semantics(Brooks and Stein, 2004). Roy and Reiter (2005)in fact define the termGROUNDED as ‘embodied’,which might be fine (compare with Harnad, 1990)if GROUNDED hadn’t also come to be synonymouswith MEANINGFUL. The central claim of this paperis that while embodiment may have been the originof most semantic meaning, it is no longer the onlysource for accessing a great deal of it. Further, somewords (including their meanings) may have evolvedmore or lessindependentlyof grounded experience.

thunder

lightning

white

black

brother

sister

square

circle

dog

cat

gold

silver

king

queen

latin

greek

lettuce

cabbage

soldier

sailor

measles

mumps

month

year

moon

star

salt

Figure 1: A two-dimensional projection of a seman-tic space, after Lowe (1997). The target words aretaken from the experiments of Moss et al. (1995). Ad-ditional information on nearness is contained in theweights between locations in the 2-D space.

2 Memetics’ role in development

We now know that humans could very well developan interconnected web of wordsindependentlyof theprocess of developing grounded concepts (See Fig-ure 1). Grounding then becomes a process of associ-ating someof these statistically acquired terms withembodiment-based concepts. Thus children can learnand even use the wordJUSTICE without a referent.Gradually as they gain experience of complexity of

conflicting social goals and notions of fairness de-velop a richer notion of both what the word meansand how and when both the word and the groundedconcept can be used in furthering their goals. Buteven before that, a relatively naive reference to theterm could well accomplish something.

chairtable

run

justice

phobia

chairtable

run

justice

phobia

i ii iii

(a) Deacon’s Theory

chairtable

run

justice

phobia

i ii

chairtable

run

justice

phobia

(b) Bryson’s Hypothesis

Figure 2: In Deacon’s theory, first concepts arelearneda(i), then labels for these conceptsa(ii), thena symbolic network somewhat like semanticsa(iii) . Ipropose instead that grounded concepts and seman-tics are learned in parallelb(i), then some semanticterms become understoodb(ii).

I want to be clear here: in my model, humans stillacquire associations between these two representa-tions, just as in the (Deacon, 1997) model that in-spired it. What’s different is the ordering. In mymodel, lexical semantics is learned in parallel withembodied categories and expressed behaviour. Sub-sequently,somewords become grounded as connec-tions are formed between the two representations (seeFigure 2). Nevertheless, this model also leaves thedoor open to true memetics — perhapsjustice is anevolved concept that has fundamental impact in ourculture and institutions without anyone truly ‘under-standing’ it in any deeply grounded way.

3 Building someone cheaply

The previous sections have talked about what com-poses current human intelligence. But let’s change

the topic now to trying to build someone capable of adecent conversation, even of coming up with the oc-casional good idea apparently on their own. Someonethat could pass the Turing test if you chatted to themat the bus stop for 20 minutes, assuming you couldn’tsee what they looked like.

Figure 2(b) implies that memetics can only give ushalf the story, but this is wrong on two counts. First,I do not think embodiment is necessary for conceptformation. We develop concept for justice to go alongwith the label, and I expect this same process couldgo on for quite a lot of other words.

It is possible that we’d need to provide some pre-formed seed concepts to get the system rolling. Thismay be necessary for two reasons:

• Purely for bootstrapping the learning system.It’s possible that all concepts formed frommemetic experienceare formed partially in re-lation to or contrast with established concepts,so our poor disembodied mind might need somegood, rich precocial concepts to get started (seefurther Sloman and Chappell, 2005).

• Because our memetic culture might not carryknowledgeeveryonegets for free. Given thata huge amount of what it means to be humanis embedded in our semantic assumptions, it ispossible that the brain can fill in the gaps. Strokeand lesion patients sometimes recover enormousfunctionality deficits if they still have enoughof their brain intact that they can use the ex-isting bits. If sufficiently stimulated (the mainpoint of therapy), these surviving parts can pro-vide enough information about what themissinginformation should look like that the individualmay recover some lost skills. However, it is pos-sible that some concepts are so incredibly uni-versal to human experience that there just isn’tenough information in the culture to reconstructthem.

But in general, I still think it might be easier to pro-gram some concepts (or proto-concepts) by hand thanto build and maintain a robot that is sufficiently robustand long-lived, and has a sufficiently rich motor andsensor capacities, that it could do a better job of learn-ing such concepts from its embodied experience.

But the other reason Figure 2(b) is not showingus that memetics is half the story is because a veryimportant part of the story is left out. Even if wehad an agent with all the knowledge of a human (orsay we had a search engine with all the knowledgeany human has ever put on the web), if all that agentever does islearns, it isn’t very human-like. To build

someone, we need not only basic capacities for per-ception and action (which in the meme machine’scase is just language in and out) but also motivationand action selection (Bryson, 2001). Even the cheap-est human-like agent would need to have a set of pri-oritised goals, probably some sort of emotional / tem-porally dependent state to oscillate appropriately be-tween priorities, and a set of plans (in this case, syn-tax and dialog patterns) to order its actions in such away that it can achieve those goals.

Fortunately, nearly everyone in AI who buildsagents (even roboticists) builds this part of the sys-tem in software, so again, there is no driving reasonto bring in embodiment. Of course, without a bodythese goals would have to be purely intellectual or so-cial (find out about you, talk about me, figure out howto use new words appropriately) — many but not allhuman goals would be inaccessible to a disembodiedmeme machine.

4 Conclusion

This short paper argues that although embodiment isclearly involved in human thought and language us-age, we have consequently evolved and developed aculture permeated with the knowledge we derive inour embodied existence, and as such a cheap but rea-sonably entertaining agent might be built with no em-bodiment at all. Of course AI has tried to do thisfor several decades, but I think they have come at itthe wrong way, focusing on logic-based reasoning toomuch and case- or template-based reasoning too little.Humans however are imitation and case-learning ma-chines — to such an extent that some of our wisdom/ common sense may well have evolved memeticallyrather than ever having been fully understood or rea-soned about by anyone.

Acknowledgements

I’d like to thank Push Singh for getting me to thinkabout common sense, even though we never agreedwhat it meant. I’d like to thank Will Lowe for teach-ing me about statistical semantics, even though westill don’t agree what the implications are.

References

Dana H. Ballard, Mary M. Hayhoe, Polly K. Pook,and Rajesh P. N. Rao. Deictic codes for the em-bodiment of cognition.Brain and Behavioral Sci-ences, 20(4), December 1997.

A. Billard and K. Dautenhahn. Experiments in socialrobotics: grounding and use of communication inautonomous agents.Adap. Behav., 7(3/4), 2000.

Lera Boroditsky and Michael Ramscar. The roles ofbody and mind in abstract thought.PsychologicalScience, 13(2):185–188, 2002.

Rodney A. Brooks. Intelligence without representa-tion. Artificial Intelligence, 47:139–159, 1991.

Rodney A. Brooks and Lynn A. Stein. Buildingbrains for bodies.Aut. Robots, 1(1):7–25, 2004.

Joanna J. Bryson.Intelligence by Design: Princi-ples of Modularity and Coordination for Engineer-ing Complex Adaptive Agents. PhD thesis, MIT,Department of EECS, Cambridge, MA, June 2001.AI Technical Report 2001-003.

David Chapman. Planning for conjunctive goals.Ar-tificial Intelligence, 32:333–378, 1987.

S. Chernova and M. Veloso. Learning and using mod-els of kicking motions for legged robots. InTheProc. of ICRA-2004, New Orleans, May 2004.

Ronald Chrisley and Tom Ziemke. Embodiment. InEncyclopedia of Cognitive Science, pages 1102–1108. Macmillan, 2002.

Andy Clark. Being There: Putting Brain, Body andWorld Together Again. MIT Press, Cambridge,MA, 1997.

Andy Clark and David Chalmers. The extended mind.Analysis, 58(1):7–19, 1998.

Terrence Deacon.The Symbolic Species: The co-evolution of language and the human brain. W.W. Norton & Company, New York, 1997.

Arthur M. Glenberg and Michael P. Kaschak.Grounding language in action.Psychonomic Bul-letin & Review, 9(3):558–565, 2002.

Stevan Harnad. The symbol grounding problem.Physica D, 42(1–3):335–346, 1990.

David Kortenkamp, R. Peter Bonasso, and RobinMurphy, editors.Artificial Intelligence and MobileRobots: Case Studies of Successful Robot Systems.MIT Press, Cambridge, MA, 1998.

George Lakoff and Mark Johnson.Philosophy in theFlesh: The embodied mind and its challenge toWestern thought. Basic Books, New York, 1999.

Will Lowe. Semantic representation and priming in aself-organizing lexicon. In J. A. Bullinaria, D. W.Glasspool, and G. Houghton, editors,Proc. of theFourth Neural Computation and Psychology Work-shop: Connectionist Representations (NCPW4),pages 227–239, London, 1997. Springer.

H. E. Moss, R. K. Ostrin, L. K. Tyler, and W. D.Marslen-Wilson. Accessing different types of lex-ical semantic information: Evidence from prim-ing. Journal of Experimental Psychology: Learn-ing, Memory and Cognition, 21:863–883, 1995.

Chandana Paul. Investigation of Morphology andControl in Biped Locomotion. PhD thesis, Depart-ment of Computer Science, University of Zurich,Switzerland, 2004.

Robert F. Port and Timothy van Gelder, editors.Mindas Motion: Explorations in the Dynamics of Cog-nition. MIT Press, Cambridge, MA, 1995.

D. Roy and E. Reiter. Connecting language to theworld. Art. Intel., 167(1–2):1–12, 2005.

Deb Kumar Roy.Learning from Sights and Sounds:A Computational Model. PhD thesis, MIT, MediaLaboratory, sep 1999.

Stefan Schaal, Auke Ijspeert, and Aude Billard. Com-putational approaches to motor learning by imita-tion. Philosophical Transactions of the Royal So-ciety of London, B, 358(1431):537–547, 2003.

Candace L. Sidnera, Christopher Leea, Cory D.Kiddb, Neal Lesha, and Charles Richa. Explo-rations in engagement for humans and robots.Ar-tificial Intelligence, 166(1–2):140–164, 2005.

Aaron Sloman and Jackie Chappell. The altricial-precocial spectrum for robots. In L. Pack Kaelblingand A. Saffiotti, editor,Proceedings of the Nine-teenth International Conference on Artificial Intel-ligence (IJCAI-05), pages 1187–1194, Edinburgh,August 2005. Professional Book Center.

Luc Steels and Frederic Kaplan. Bootstrappinggrounded word semantics. In T. Briscoe, edi-tor, Linguistic evolution through language acqui-sition: formal and computational models. Cam-bridge University Press., 1999.

Luc Steels and Paul Vogt. Grounding adaptive lan-guage games in robotic agents. In C. Husbandsand I. Harvey, editors,Proceedings of the FourthEuropean Conference on Artificial Life (ECAL97),London, 1997. MIT Press.

Requirements & Designs: Asking Scientific Questions AboutArchitectures

Nick Hawes, Aaron Sloman and Jeremy WyattSchool of Computer ScienceUniversity of Birmingham

{n.a.hawes,a.sloman,j.l.wyatt}@cs.bham.ac.uk

Abstract

This paper discusses our views on the future of the field of cognitive architectures, and how the scien-tific questions that define it should be addressed. We also report on a set of requirements, and a relatedarchitecture design, that we are currently investigating as part of the CoSy project.

1 What Are Architectures?The first problem we face as researchers in the fieldof cognitive architectures is defining exactly what weare studying. This is important because the term“architecture” is so widely used in modern techno-logical fields. An agent’s cognitive architecture de-fines the information-processing components withinthe “mind” of the agent, and how these componentsare structured in relation to each other. Also, thereis a close link between architectures and the mech-anisms and representations used within them (whererepresentations can be of many kinds with many func-tions). Langley and Laird (2002) describe a cognitivearchitecture as including “those aspects of a cognitiveagent that are constant over time and across differ-ent application domains”. We extend this to explic-itly allow architectures to change over time, either bychanging connection patterns, or altering the compo-nents present. Excluding such changes from the studyof architectures may prevent the discussion of the de-velopment of architectures for altricial information-processing systems (Sloman and Chappell, 2005).

2 Related WorkHistorically, most research into cognitive architec-tures has been based around specific architecturessuch as ACT-R, SOAR, and ICARUS (for a summarysee (Langley and Laird, 2002)). A lot of work hasbeen devoted to developing iterations of, and exten-sions to, these architectures, but very little work hasbeen done to compare them, either to each other, orto other possible design options for cognitive archi-tectures. In other words, little work has been doneon the general science of designing and building cog-

nitive systems. Anderson and Lebiere (2003) haverecently attempted to address this by comparing twodifferent architectures for human cognition on a setof requirements.

3 Architectures & ScienceTo advance the science of cognitive systems weneed two related things: clear, testable questions toask, and a methodology for asking these questions.The methodology we support is one of studying thespace of possible niches and designs for architectures,rather than single, isolated, designs (Sloman, 1998b).Within such a framework, scientific questions can beasked about how a range of architecture designs re-late to sets of requirements, and the manner in whichparticular designs satisfy particular niches. Withoutreference to the requirements they were designed tosatisfy, architectures can only be evaluated in a con-ceptual vacuum.

The scientific questions we choose to ask about thespace of possible architecture designs should ideallyprovide information on general capabilities of archi-tectures given a set of requirements. This informationmay not be particularly useful if it is just a laundrylist of instructions for developing a particular archi-tecture for a particular application domain. It will bemore useful if we can characterise the space of designoptions related to a set of requirements, so that futuredesigners can be aware of how the choices they makewill affect the overall behaviour of an agent. Thequestions asked about architectures can be motivatedby many sources of information, including competingarchitecture designs intended for similar niches.

In order for questions about architectures, andtheir answers, to be interpreted in the same way

CentralProcessingPerception Action

Meta-management(reflective processes)

(newest)

Deliberative reasoning("what if" mechanisms)

(older)

Reactive mechanisms(oldest)

Figure 1: The CogAff Architecture Schema.

by researchers across many disciplines, we need toestablish a common vocabulary for the design ofinformation-processing architectures. As a step to-wards this, we use the CogAff schema, depicted inFigure 1, as an incomplete first draft of an ontol-ogy for comparing architectures. (Sloman, 2001).The schema is intended to support broad, two-dimensional, design- and implementation-neutralcharacterisations of architectural components, basedon information-processing style and purpose. If anarchitecture is described using the schema, then it be-comes easier to compare it directly to other architec-tures described in this way. This will allow differingarchitectures to be compared along similar lines, evenif they initially appear to have little in common.

4 A Minimal ScenarioFor our current research as part of the CoSy project1,we are working from requirements for a pre-linguisticrobot that has basic manipulative abilities, and is ableto explore both its world and its own functionality. Ata later date we will extend this to add requirementsfor linguistic abilities. We are approaching the prob-lem in this way because we believe that a foundationof action competence is necessary to provide seman-tics for language. These requirements come from theCoSy PlayMate scenario, in which a robot and a hu-man interact with a tabletop of objects to perform var-ious tasks2.

In our initial work on this scenario we will focus onthe requirements related to the architectural elementsnecessary to support the integration of simple ma-nipulative abilities with a visual system that supportsthe recognition of basic physical affordances from 3D

1See http://www.cognitivesystems.org for more information.2More information about the PlayMate is available at

http://www.cs.bham.ac.uk/research/projects/cosy/pm.html.

structure. We see this as the absolute minimum sys-tem for the start of an exploration of PlayMate-likeissues in an implemented system 3.

Our requirements analysis has led to the design ofa prototype architecture which we believe will sat-isfy the niche they specify. Space restrictions do notpermit a full description of the architecture, but inbrief the architecture features multiple concurrentlyactive components, including: a motive generator;information stores for currently active motives, gen-eral concepts, and instances of the general concepts;a general-purpose deliberative system; a fast globalalarm system; a plan execution system; managementand meta-management components; a spreading acti-vation substrate; and closely coupled vision and ma-nipulation sub-architectures.

The high-level design for this architecture is pre-sented in Figure 2, and is in part inspired by ourprevious work on information-processing architec-tures (e.g. (Beaudoin, 1994; Sloman, 1998a; Hawes,2004)). Although this design clearly separates func-tionality into components, these components will betightly integrated at various levels of abstraction. Forexample, to enable visual servoing for manipulation(e.g. (Kragic and Christensen, 2003)), visual and pro-prioceptive perception of the movement of the robot’sarm in space must be closely coupled with the instruc-tions sent to the arm’s movement controller.

The information-processing behaviour of the ar-chitecture is driven by motives, which are gener-ated in response to environmental or informationalevents. We will allow humans to generate environ-mental events using a pointing device. The agentwill interpret the gestures made with this device asdirect indications of desired future states, rather thanintentional acts (thus temporarily side-stepping someof the problems of situated human-robot interaction).Generated motives will be added to a collection ofcurrent motives, and further reasoning may be neces-sary if conflicts occur between motives. The deliber-ative system will produce action plans from motives,and these plans will be turned into arm commands bythe plan execution system. This process will be ob-served at a high level by a meta-management system,and at a low level by an alarm system. The meta-management system may reconfigure the agent’s pro-cessing strategies if the situation requires it (e.g. byaltering the priorities associated with motives). Theglobal alarm system will provide fast changes in be-haviour to handle sudden, or particularly critical, sit-uations.

3Our work on requirements from the PlayMate scenario is pre-sented roughly at http://snipurl.com/cosy playmate.

MotiveSystem

Spreading Activation Substrate

Perception ActionCentral Processing

Reac

tive

Delib

erat

ive

Met

a-M

anag

emen

tMemory System

InstanceMemory

General M

emory

Working M

emory

CurrentM

otivationsM

otiveG

enerator

ArmControlSystem

PlanExecution

System

GeneralPurpose

DeliberativeSystem

GlobalAlarm

System

Meta-Management

System

Proprio-ceptive

&Haptic

Perception

VisualPerception

Figure 2: The Proposed Architecture.

Although a spreading activation substrate is fea-tured in the design, we are not currently committedto its inclusion in the final system. Instead we are in-terested in the kinds of behaviour that such a designchoice will facilitate. Information across the architec-ture may need to be connected to related information,and such an approach may allow the agent to exploitthe structure of such connections by spreading activa-tion, which may be based on co-occurrence, recencyor saliency. We are also interested in investigatinghow to combine distributed approaches to represent-ing and processing information with more localisedapproaches, and what design options this provides.Such a combination of processing approaches can beseen in the work on the MicroPsi agent architecture(Bach, 2005).

5 Architecture EvaluationThe question of whether, and why, our proposed ar-chitecture is appropriate for an agent with PlayMate-like requirements is quite hard to formulate in a waythat is directly answerable. Instead, we must use ourrequirements analysis, and our previous experiencesof designing architectures for such agents, to de-rive precise questions and suggest testable hypothe-ses from this. The following paragraphs present spe-cific questions we could ask about the architecture,and many other architectures.

How can information exchange between architec-tural components be controlled, and what trade-offsare apparent? For example, should information from

visual perception be pushed into a central repository,or should task appropriate information be pulled fromvision when necessary, or should this vary dependingon the system’s information state, goals, the perfor-mance characteristics of subsystems, etc.?

What are the relative merits of symbolic and sub-symbolic (e.g. spreading activation) processing meth-ods when applied to collating information across theentire architecture? The proposed spreading activa-tion substrate could interact with various processesand ontologies, and record how information is ma-nipulated. Alternatively, this could be implementedas a central process that must be notified by other pro-cesses when certain operations occur. These differentapproaches could be compared on their proficiency atmanaging large volumes of multi-modal information,their ability to identify changes of context across thearchitecture, the difficulty of integrating them withother processes, or the ease with which they facilitateother operations (such as attentional control).

To what degree should the architecture encapsu-late modality-specific and process-specific informa-tion within the components that are directly con-cerned with it? Cross-modal application of the earlyprocessing results can increase accuracy and effi-ciency in some processes (c.f. (Roy and Mukher-jee, 2005)). In other cases information may be irrel-evant, and attempts to apply it across modalities mayhave the opposite effect whilst increasing the compu-tational load on an architecture. We could explore thisnotion more generally by asking what types of infor-mation should, and should not, be made available byarchitectural components whilst they are processing

it, and what use other architectural components couldmake of such information.

Given the types of information the architecture willbe processing, what are the advantages and disadvan-tages of having a single central representation intowhich all information is translated? How do these ad-vantages and disadvantages change when additionalprocesses are added into the architecture?

What role does a global alarm mechanism havein PlayMate-like domains, how much informationshould it have access to, and how much controlshould it have? For example, an alarm mechanismmay have access to all the information in the archi-tecture and risk being swamped by data, or it mayhave access to limited information streams and riskbeing irrelevant in many situations.

Does the architecture need some global methodfor producing serial behaviour from its many con-currently active components, or will such behaviourjust emerge from appropriate inter-component inter-actions? Approaches to component control includea single central component activating other compo-nents, a control cycle in which activity is passed be-tween a small number of components, and other vari-ations on this. Are there particular behaviours thatare not achievable by an agent with this kind of con-trol, and only achievable by an agent we decentralisedcontrol, or vice versa? If such trade-offs exist, howare the relevant to PlayMate-like scenarios?

Given the range of possible goals that will needto be present in the whole system, how should thesegoals be distributed across its architecture, and howdoes this distribution affect the range of behavioursthat the system can display?

Obviously there are many other questions we couldask about the architecture, such as whether it willfacilitate the implementation of mechanisms for ac-quiring and using orthogonal recombinable compe-tences4. The process of designing and implement-ing architectures to meet a set of requirements in-volves the regular re-evaluation of the requirementsin light of new developments. Inevitably, this meansthat other questions will be considered, and the aboveones reconsidered, as the research progresses.

AcknowledgementsThe research reported on in this paper was supportedby the EU FP6 IST Cognitive Systems Integratedproject Cognitive Systems for Cognitive Assistants“CoSy” FP6-004250-IP.

4http://www.cs.bham.ac.uk/research/projects/cosy/papers/#dp0601

ReferencesJohn R. Anderson and Christian Lebiere. The Newell

test for a theory of cognition. Behavioral & BrainScience, (26):587–637, 2003.

Joscha Bach. Representations for a complexworld: Combining distributed and localist repre-sentations for learning and planning. In StefanWermter, Gunther Palm, and Mark Elshaw, edi-tors, Biomimetic Neural Learning for IntelligentRobots, volume 3575 of Lecture Notes in Com-puter Science, pages 265–280. Springer, 2005.

Luc P. Beaudoin. Goal Processing In AutonomousAgents. PhD thesis, School of Computer Science,The University of Birmingham, 1994.

Nick Hawes. Anytime Deliberation for ComputerGame Agents. PhD thesis, School of ComputerScience, University of Birmingham, 2004.

Danica Kragic and Henrik I. Christensen. A frame-work for visual servoing. In Markus Vincze andJames L. Crowley, editors, ICVS-03, volume 2626of LNCS. Springer Verlag, March 2003.

Pat Langley and John E. Laird. Cognitive architec-tures: Research issues and challenges. Technicalreport, Institute for the Study of Learning and Ex-pertise, Palo Alto, CA, 2002.

Deb Roy and Niloy Mukherjee. Towards situatedspeech understanding: Visual context priming oflanguage models. Computer Speech and Lan-guage, 19(2):227–248, 2005.

Aaron Sloman. Varieties of affect and the cogaff ar-chitecture schema. In Proceedings of the AISB’01Symposium on Emotion, Cognition and AffectiveComputing, pages 1–10, 2001.

Aaron Sloman. Damasio, descartes, alarms and meta-management. In Proceedings of IEEE Conferenceon Systems, Man, and Cybernetics, pages 2652–2657, 1998a.

Aaron Sloman. The “semantics” of evolution: Tra-jectories and trade-offs in design space and nichespace. In Helder Coelho, editor, Progress in Ar-tificial Intelligence, 6th Iberoamerican Conferenceon AI (IBERAMIA), pages 27–38. Springer, Lec-ture Notes in Artificial Intelligence, Lisbon, Octo-ber 1998b.

Aaron Sloman and Jackie Chappell. The Altricial-Precocial Spectrum for Robots. In Proceedings IJ-CAI’05, pages 1187–1194, Edinburgh, 2005.

Integration and Decomposition in Cognitive ArchitectureJohn Knapman*

*School of Computer ScienceThe University of Birmingham, [email protected]

Abstract

Given the limitations of human researchers’ minds, it is necessary to decompose systems and thenaddress the problem of how to integrate at some level of abstraction. Connectionism and numericalmethods need to be combined with symbolic processing, with the emphasis on scaling to largenumbers of competencies and knowledge sources and to large state spaces. A proposal is brieflyoutlined that uses overlapping oscillations in a 3-D grid to address disparate problems. Two selectedproblems are the use of analogy in commercial software evolution and the analysis of medical im-ages.

1 IntroductionDebates about cognitive architecture often deal withchoices between rival techniques and modalities. Bycontrast, Minsky, Singh and Sloman (2004) empha-sise the importance of integrating components ofdiffering kinds.

A standard method in designing complex sys-tems is to decompose into components and thenconnect them, simply because it is impossible forindividual team members to hold all the detail intheir heads. Because of such limitations, there arethose who think that it is not possible for humanbeings to design systems that exhibit general intelli-gence comparable to their own (Rees, 2003). Evenafter fifty years of research, such counsels of despairare premature until we have explored more of thepossibilities. The prizes are great, in part for a betterunderstanding of ourselves and the enlightenment itwill bring in the tradition of the dazzling humanprogress since the Renaissance in many fields, in-cluding medicine and technology. There are alsostrong commercial benefits in being able to buildsmarter systems that can undertake dangerous orunpopular work.

To make substantial progress, there are lessonsto learn from work on the architecture of computersystems generally (Bass, Clements and Kazman,2003), which informs us that a key responsibility ofthe architect is to define how the components fittogether and interface with each other. This assumesthat there is a degree of uniformity in the style of thecomponents, so that their interfaces can be defined

in a form that is commonly understood by the peo-ple working on them. Interfaces are then specified interms, firstly of timing and flow of control (serial orparallel, hierarchical or autonomous, event driven orscheduled, for example), and secondly in terms ofthe format of data flowing between components.

A less well documented but nevertheless wellknown experience is that a small team of dedicatedexperts can achieve wonders compared with largeteams of people with mixed ability. A small team offour highly experienced people can often produceefficient, reliable and timely systems that solveproblems most effectively. By contrast, many re-search (and development) teams consist of one ex-perienced person, who is very busy, and severalbright but inexpert assistants. To get an expert teamtogether to work full time and hands on for a periodof years is expensive and disrupts other activities,but the results can be very exciting.

Even with such a promising kind of team organi-sation, interfaces have to be defined. In cognitivearchitecture, components will be of differing kinds.Connectionist methods that emphasise learning andconvergence must somehow be combined effec-tively with symbolic processing. Sun (2002) showshow symbolic rules may be inferred from state tran-sitions in a connectionist network, but that is onlyone of several ways in which interactions could takeplace. There are other promising numerical meth-ods, such as KDDA, which has been applied suc-cessfully to face recognition (Wu, Kittler, Yang,Messer and Wang, 2004).

The strong emphasis on learning and learnabilityin connectionist methods is carried over to symbolicrules in Sun’s method, but there are other benefits

from connectionism, such as approximate matchingand graceful degradation, that need to be exploitedin a combined system. Benefits like these accrue notjust from connectionism but from other soft com-puting paradigms (see Barnden (1994) for a gooddiscussion in the context of the study of analogy).

One method for linking components that usequite different representations from differing stand-points is to use numerical factors, whether these areinterpreted as probabilities, fuzzy or other uncer-tainty values, ad hoc weights, or arrays of affectivemeasures (e.g., motivation (Coddington and Luck,2003), duty, elegance). Such techniques can alsohelp to reduce search spaces and large state spaces,although they may sometimes smack of heuristics ofthe kind favoured by researchers in the 1970s.

2 AbstractionMany different kinds of abstraction have been iden-tified. In the field of computer system design, thereare methods with well-defined levels of abstraction,the most concrete being a set of programs that im-plements a design. In the B and Z methods (Bert,Bowen, King and Waldén, 2003) there are formalproof procedures to show that a more concretespecification is a refinement of one that is more ab-stract.

Less formal methods, such as the Unified Soft-ware Development Method (Jacobson, Booch andRumbaugh, 1999), based on the Unified ModelingLanguage (UML) still have the idea that a high-leveldesign (as an object diagram) can be refined pro-gressively until implementation. Beyond the writingof the programs, a more complete analysis sees theprograms and their specifications as abstractions oftheir actual performance, as recorded in traces whilethey execute. The diagrams are found to be betterfor communicating between designers and develop-ers than either formal-language statements or natu-ral-language descriptions.

Even though there is a clear definition of ab-straction in these examples, users often feel intui-tively that the diagrams are less abstract than thetext. Such an intuition throws up the difficulty ofdefining abstraction in a general way. It seems todepend on fitness for purpose as well as brevity andomission of detail.

In the case of a story told in a natural language, asynopsis is more abstract in the latter sense. In thecase of scientific papers, the “abstract” is intendedto help readers decide whether to read the main pa-per. The “management summary” of a business re-port allows busy senior people to understand enoughto be able to trust and defend their subordinates’recommendations.

A reference to “the report”, “the story” or “thepaper” is clearly more abstract than having to repeatthe content. Generally, referring to something ver-bally or symbolically is brief, enables it to be dealtwith in its absence, and permits economy ofthought, i.e., it leaves mental room, as it were, forother concepts to be introduced and related to it.

Uncertainty representations provide anotherform of abstraction, and one that is particularly easyto formalise (Baldwin, Martin and Pilsworth, 1995).A fuzzy value can stand for “the hand-written letterthat is probably a k”, or “the car that looked like aJaguar”. Such representations are the clearest candi-date for a form of abstraction that can be used toenable disparate sources of knowledge to be com-bined effectively and rapidly. For example, a mov-ing picture, some sounds, previous experience ofsteam trains and expectation that the Flying Scots-man will pass through the station at 11:00 a.m.combine so as to interpret the distant approach ofthe train while ignoring most other details.

3 Large State SpacesProblems often entail large numbers of possibilities.Although both abstractions and numerical methodscan help to reduce the possibilities, there are fre-quently cases where many possible states must becarried forward before higher abstractions can beused to eliminate some. Sometimes called the “AI-complete” problem, there have been hopes thatquantum information processing could address it.However, the breakthrough has not so far come.Apart from special cases where the data has cyclicproperties (as in modal arithmetic for code break-ing), the main benefit is that large state spaces(having N states) can be searched in time propor-tional to √N instead of N/2. This can be worthwhilein some cases (e.g., reducing 500 billion steps to onemillion), but must await the availability of suitablehardware. The programming skills required arerather daunting.

Some people have suggested that the unstableperiodic orbits (UPOs) of chaotic oscillators (Crookand Scheper, 2001) can represent potentially infi-nitely many things. It has been observed that thesignals in a brain appear to be either random or cha-otic before settling rapidly to a coherent state (Tsuiand Jones, 1999). In theory, a random signal con-tains all possible frequencies, but a practical randomsignal is limited to the bandwidth of the channel andtakes a long time to distinguish from a sum of manyoverlapping signals of different frequencies, an ideataken up in the Proposal section below. A chaoticsignal is somewhere in between, and is also indis-tinguishable in practice (Gammaitoni, Hänggi, Jungand Marchesoni, 1998) from the other two.

4 RequirementsA test bed for the exploration of ideas is needed thatsupports the following requirements:

1. Combining modalities, especially connectionist,symbolic and affective

2. Combining competencies, including (but notlimited to) analogy and structure matching, vi-sion and formal language interpretation

3. Abstraction, with emphasis on combiningknowledge sources

4. Scaling to large state spaces, particularly ex-ploring efficient forms of parallelism

5. Scaling to large numbers of knowledge sources

5 ProposalTo explore these issues and requirements, one ap-proach is to design and simulate a programmablesignal-processing network capable of both symbolicand connectionist processing, with commitment toas few preconceptions as possible.

A flexible structure is proposed that will allowfor the interplay of several loose decompositions.We will allow an element to belong to severalgroupings, which can be nested. It must identifywith which grouping any particular communicationis associated. With such a protocol, elements arenot confined to a hierarchical organisation, but hier-archies are still possible, and communication chan-nels are more manageable than in a complete freefor all.

E1 E2 E3 E4 E5 E6 E7 E8 E10E9G1 G2

G3G4

G5G6

H1 H2 H3

T

Figure 1: Illustration of Flexible Structuring – con-tainment is shown by nesting or with arrows

A message between two elements has to con-form to the representational format of their commongrouping at some level of nesting. Then, if the primemethod of communication is broadcasting, for ex-ample, broadcasting (both sending and receiving)would only take place within a grouping and wouldfollow the representational convention for thatgrouping. In the illustration of Figure 1, E1 and E2can communicate by the conventions of their con-

taining groupings G1, H1 and T. E1 can communicatewith E3, E4, E5 and E6 by the conventions of H1 andT, because these four are in G3, which is in H1. E1can only communicate with E10 by the conventionsof T or by some form of relay through intermediatesubordinate groupings.

Within such a general framework, it is proposedto exploit the parallelism inherent in modulatedoverlapping signals of many frequencies embeddedin a 3-D grid. One such model is described by Cow-ard (2004) as an attempt to emulate aspects of thearchitecture of the brain. Uncertainty models, in-cluding Bayesian nets (Pearl, 1988) and fuzzy logic,are to be accommodated. It must be suitable for bothlearning and programmed behaviour. Programmingprovides the flexibility to explore challenging appli-cations, and it allows certain abilities to be built in.For other capabilities, there should always be atleast the possibility that the symbols and mappingsdefined could be acquired through experience or byan indirect process such as analogy, deduction orabstraction.

There must be support for widely differing datatypes, particularly for image processing and formallanguage processing. The particular problems underconsideration are in two domains:

1. The application of analogical reasoning tocommercial software evolution

2. Analysis of medical imagesIt must be reasonably clear how the framework

may be extended to other modalities, e.g., move-ment control for vehicles or robots.

A particularly elegant form of programming isfunctional programming, where every program is atransformation from the input parameters to an out-put value. A function is defined in mathematics as aset of ordered pairs of input and output. Most pro-grammers think of procedures as behaving algo-rithmically, but the alternative definition sees afunction as a transformation from input to output viamemory lookup. The effect of a procedure givingmultiple results can be achieved by multiple func-tions that take the same parameters. Functions ofseveral variables can be decomposed into (singlevalued) functions of pairs of variables.

This view of a function is particularly conven-ient for the parallel processing of overlapping sig-nals of many frequencies. Such signals can encoderanges or uncertainty in data but can also representpatterns of input, such as image intensities or char-acters in text.

Frequency can be used to encode data values,with amplitude representing strength. A transforma-tion converts from an input frequency to an outputfrequency and may adjust the weight. Thus it maytransform one pattern to another or may performsimultaneous logical operations on parallel data. Thetransformation of patterns using weight adjustment

can be equivalent to that performed in connectionistnetworks, with a natural mechanism for incorporat-ing learning. Logical operations may not need toperform weight adjustment, but conjunction anddisjunction between two sets of signals are needed.Disjunction can be achieved by summing. Conjunc-tion requires frequency matching.

Some programming models require a globalstate, for example the query and subgoals inPROLOG, whereas object-oriented (OO) program-ming localises the state information in separate ob-jects.

A 3-D grid can contain many channels, and thesignals can persist for some time, somewhat in themanner of objects in OO programming, though withdifferent dynamics. Together they may encode avery large state space, even though there is a limit tothe number of signals that can be carried on onechannel because of the constraints of bandwidth.They are well suited to representing data structuressuch as parse trees or image region classifications.

The interactions are different from those inquantum computing; there, each state is completelyintegrated but is processed in isolation from all oth-ers. In the 3-D grid, a state is distributed, but infor-mation from many states can be mixed.

6 ConclusionAfter half a century’s work, there remain manyideas that can be explored, particularly in the arenaof integration. It would be most exciting to see whata dedicated team of four or five experts able to workfull time for a period of years could achieve. How-ever, the challenge remains of discovering a smallenough set of representations that are generalenough for interfacing between the kinds and stylesof components identified but are nevertheless suc-cinct enough to be computationally tractable.

AcknowledgementsI’m grateful for valuable discussions with AaronSloman.

ReferencesJ.F. Baldwin, T.P. Martin and B.W. Pilsworth. Fril

– Fuzzy and Evidential Reasoning in ArtificialIntelligence, Research Studies Press, Taunton,UK and Wiley, New York, 1995, p.54

John Barnden. On Using Analogy to ReconcileConnections and Symbols, in Levine, D.S. andAparicio, M. (eds.) Neural Networks forKnowledge Representation and Inference,Hillsdale, New Jersey: Erlbaum 27-64, 1994

Didier Bert, Jonathan Bowen, Steve King, MarinaWaldén (eds.) ZB 2003: Formal Specificationand Development in Z and B: Third Interna-tional Conference of B and Z Users, Turku,Finland, June 4-6, LNCS 2651, Springer, 2003

Len Bass, Paul Clements and Rick Kazman. Soft-ware Architecture in Practice (2nd edition),Addison-Wesley, 2003

Alexandra Coddington and Michael Luck. TowardsMotivation-based Plan Evaluation, in Proceed-ings of the Sixteenth International FLAIRS Con-ference (FLAIRS '03), Russell, I. and Haller, S.(eds.), AAAI Press, 298-302, 2003

L. Andrew Coward. The Recommendation Archi-tecture Model for Human Cognition. Proceed-ings of the Conference on Brain Inspired Cogni-tive Systems, University of Stirling, Scotland,2004

Nigel Crook and Tjeerd olde Scheper. A NovelChaotic Neural Network Architecture,ESANN'2001 proceedings – European Sympo-sium on Artificial Neural Networks, Bruges,Belgium, 25-27 April 2001, D-Facto public.,ISBN 2-930307-01-3, pp. 295-300, 2001

Luca Gammaitoni, Peter Hänggi, Peter Jung andFabio Marchesoni. Stochastic resonance, Re-views of Modern Physics, 70(1), 270-4, 1998

Ivar Jacobson, Grady Booch and James Rumbaugh.The Unified Software Development Process,Addison Wesley Longman, 1999

Marvin Minsky, Push Singh and Aaron Sloman. TheSt. Thomas Common Sense Symposium: De-signing Architectures for Human-Level Intelli-gence, AI Magazine, 25(2), 113-124, 2004

Judea Pearl. Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference,Morgan Kaufmann, San Mateo, California,1988

Martin Rees. Our Final Century, William Heine-mann, London, p.134, 2003

Ron Sun. Duality of the Mind: A Bottom Up Ap-proach Toward Cognition, Mahwah, New Jer-sey: Erlbaum, 2002

Alban Pui Man Tsui and Antonia Jones. Periodicresponse to external stimulation of a chaoticneural network with delayed feedback, Inter-national Journal of Bifurcation and Chaos,9(4), 713-722, 1999

Wu, X.J., Kittler, J., Yang, J.Y., Messer, K. andWang, S.T. A New Kernel DiscriminantAnalysis (KDDA) Algorithm for Face Recog-nition, Proceedings of the British Machine Vi-sion Conference 2004, Kingston, UK, 517-526

On the structure of the mind

Maria Petrou and Roberta PiroddiImperial College London

Department of Electrical and Electronic Engineering, Exhibition Road, London SW7 2AZ{maria.petrou,r.piroddi}@imperial.ac.uk

Abstract

The focus of any attempt to create an artificial brain and mindshould reside in the dynamic model ofthe network of information. The study of biological networks has progressed enormously in recentyears. It is an intriguing possibility that the architecture of representation and exchange of informationat high level closely resembles that of neurons. Taking thishypothesis into account, we designed anexperiment, concerning the way ideas are organised according to human perception. The experimentis divided into two parts: a visual task and a verbal task. A network of ideas was constructed using theresults of the experiment. Statistical analysis showed that the verbally invoked network has the sametopological structure as the visually invoked one, but the two networks are distinct.

1 Introduction

The key to reproducing intelligence and conscious-ness lies in the understanding and modelling of theorganisation of functional areas, the privileged path-ways of communication and exchange of informa-tion, and their more likely evolution and growth, as inAleksander (2005). The heart of a biomimetic robotis the pump that sustains the blood-flow of informa-tion. The information comes from a number of func-tional parts which do not necessarily need to be bioin-spired: it is possible to produce depth maps withoutknowing how the visual system produces them. Whatis important to know is how the depth map is used fornavigation and in how many other different tasks it isused. The organisation and evolution of the informa-tion, as well as the fusion of pieces of information ofvaried degrees of uncertainity, are characteristics ofa living being and this is what needs to be emulateddirectly from a biological system.

The Project of the Human Genome Consortium(2003) culminated with the completion of the fullhuman genome sequence in April 2003. Now theresearch moves forwards, after having named andcounted genes. Understanding the organisation andinteraction of genes is the new frontier of biotechnol-ogy. This endeavour is helped by the development byBarabasi (2002) of a new science of networks, whichsheds light into the complexity of organisation of liv-ing and artificial mechanisms alike.

The focus of this century will be the brain. Manyresearchers have already moved to the race for map-ping the human brain neurons and understanding their

functionalities, for example in Koslow and Hyman(2000). However, neurons are just one element. Whatabout that special products of the human brain, theideas? Is it possible to gain insight into the world ofthe ideas? Is it possible to count them? Are ideasorganised in a recognisable way?

In the article by Macer (2002) the Behaviouromeproject was first proposed. In analogy withthe genome project aimed at mapping the humangenome, here the aim was to count ideas and to findout whether the number of ideas is finite, uncount-able or infinite. One of the proposed means of obtain-ing the final goal was to provide a mental mapping ofideas and their interrelationships.

The organisation of information may be modelledin mathematical terms as a dynamic network. We areintrigued by the possibility that from the lowest possi-ble level of information sources, namely the neurons,to the highest possible level of information products,the ideas, the same structural network may be un-derling the architectural building of their organisa-tion. Networks have been used for a long time torepresent complex systems. They have been popu-larised recently by Barabasi (2002), who studied self-organising networks. Self-organising networks maybe random or scale-free. There is evidence that theorganising architecture of a scale-free network un-derpins the structure and evolution of biological sys-tems (like cells), social systems (like one’s circle offriends) and artificial networks (like the Internet).

We designed an experiment in order to study howhigher level information invoked by different stimuli,

namely visual and verbal, is organised in the mind.The importance of such a speculative attempt lies inthe fact that if a symmetry between lower and higherlevels is found, then the same mathematical modelmay be used to bridge the gap between top-down andbottom-up approaches to create artificial brains andminds.

2 Ideas and networks

In order to proceed in our experiment we need to fo-cus on two elements: ideas and networks. Ideas (andtheir relationships) are the subject of this research.We need to define them and to highlight some ele-ments of the disciplines that study their relationships.

According to Macer (2002) ideas are mental con-ceptualisations of things, including physical objects,actions or sensory experiences, that may or may notbe linguistically expressible.

Self-organising networks, occurring in the naturalworld and as a development of human activities, maybe modelled as being random or scale-free. Bothscale-free networks and random networks aresmallworld networks, which means that, although the net-works may contain billions of nodes, it takes only afew intermediate nodes to move from one node to anyother. Let us indicate byk the number of incoming oroutgoing links from any node in the network. The dif-ference between a random and a scale-free networkis quantified by the probability density functionP (k)of a node havingk incoming or outgoing links. In thecase of a random network,P (k) has a Poisson distri-bution,P1(k) ∼ exp(−k). In the case of a scale-freenetwork, the probability may be modelled by a powerfunction,P2(k) ∼ k−c, wherec is a positive constant.

3 Experimental methodology

We often see something which triggers some thought,which triggers another thought and so on. We collo-quially refer to this as a flow of thoughts. We may tryto stimulate such a trail of thoughts by showing animage or mentioning a word to a person. Each personwill generate their own path of linked thoughts. Thecollection of ideas that have been thought may be vi-sualised as a network of ideas and analysed as such.So the questions we would like to answer are:

1. How many ideas on average exist between anytwo randomly chosen ideas?

2. Does this number depend on whether the stim-uli for these ideas are presented to a person in a

visual or in a linguistic way?

3. If a network of ideas can be generated, which isits topology?

There are serious difficulties in the design of an ex-periment in order to answer the above questions. Themost important difficulties are:

1. How does one define an idea?

2. How does one deal with the enormous numberof different ideas?

3. How can one expose a subject to a collection ofideas?

4. How does one count the intermediate ideas be-tween any two ideas?

The brief answers to the above questions are as fol-lows.

1. Since we wish to show the ideas in the form ofan image in the visual experiment, we need torestrict ourselves to an idea being an object, e.g.umbrella, flower, car, sun, etc.

2. It is obvious that we need to restrict ourselvesto a finite number of ideas. We decided to use100 nouns. There were various reasons for that,mainly practical, related to the way the stimulihad to be presented to the subjects. The nounshad to be chosen in an objective way, so as notto reflect any prejudices or associations of theinvestigators. We chose them to be words uni-formly spread in the pages of a dictionary.

3. We showed to each subject the full collection ofideas as a matrix of10 × 10 images arrangedin an A0 size poster. Each noun was repre-sented by a clip-art from the collection of Mi-crosoft Office. This ensured that effort had goneinto the design of the images so that they weremost expressive for the particular object theywere supposed to depict. Before use, each im-age was converted into grey and it was modifiedso that the average grey value of all images wasthe same. This ensured that no image would bepicked out because of its excessive brightness ordarkness.

4. A picture from the collection was picked at ran-dom and then the subject was asked to pick thenext most similar one, and then the next mostsimilar one and so on. Every time an object waspicked after another one, a link was recorded be-tween these two objects.

We conducted the experiments on a sample of20 sub-jects. The sequence of actions that constituted the vi-sual experiment are listed below.

• Allow the subject at the beginning to see thewhole table of objects for several minutes, to fa-miliarise themselves with what is included in therestricted world.

• Pick up one of the objects at random and high-light it by putting a frame around it. Ask thesubject which of the remaining objects is mostsimilar to this. Once the subject has made theirchoice, cover the highlighted object, highlightthe object they had picked and ask them to pickthe next most relevant object. Stop when halfof the objects are still visible. This ensures thatthe subject has still plenty of choice when theymake their choice of the last object. The subjectmay be allowed to say that an object has no re-lation to any other object in the table. Then theparticular experiment may stop. Another exper-iment may follow starting with a different initialobject.

• The position of the objects should not bechanged during the experiment, as we wish theperson to know what is included in the restrictedworld in which they have to make their choice.

The verbal experiment was aimed at identifying howmany intermediate ideas exist between any two ideaspresented in a verbal form. This experiment tookplace several weeks after the visual experiment. Thesame subjects took part in both experiments. Eachsubject was presented with a table containing100words corresponding to the100 pictures presented inthe visual experiment. The experiment followed thesesteps:

• Allow the subject at the beginning to see thewhole table of words for several minutes, to fa-miliarise themselves with what is included in therestricted world.

• Pick up one of the words at random and high-light it by putting a frame around it. Ask thesubject which of the remaining words is mostsimilar to this. Once the subject has made theirchoice, cover the highlighted word and ask themto pick the next most relevant word. Stop whenhalf of the words are still available. The subjectmay be allowed to say that a word has no relationto any other word in the table. Then the partic-ular experiment may stop. Another experimentmay follow starting with a different initial word.

• The position of the words should not be changedduring the experiment, as we wish the person toknow what is included in the restricted world inwhich they have to make their choice.

4 Experimental results

The first point to investigate is whether there areideas that are selected more frequently than others,i.e. whether there are ideas that are more popular orspring into mind more often, or all ideas are generallyselected with the same frequency.

ID Idea Frequency # Connections

42 graduate 16 1529 doctor 12 1199 wedding 14 112 airport 12 933 electricity-mast 14 996 tree 14 990 teacher 16 8

Table 1: Ideas that manifest ahub behaviour in thevisual network.

From sociological studies reported by Gladwell(2000), we know that such nodes act ashubs and playan important role in a network. We found that only7% of the ideas are both very frequent and very wellconnected. This7% of ideas are connected to60%of the remaining ideas for the visual experiment, pre-sented in table 1, and to50% of the remaining ideasfor the verbal experiment, presented in table 2. Thesenumbers show evidence of a small world behaviourfor the networks of ideas we have built.

ID Idea Frequency # Connections

7 bicycle 8 1022 circus 7 1248 house 10 1269 rain 7 1171 road 7 983 shopping 7 999 wedding 8 11

Table 2: Ideas that manifest ahub behaviour in theverbal network.

The number of intermediate ideas between any twogiven ones may be investigated using two measures.The first measure is theaverage path, defined as theaverage length of the path between any two nodes,if such a path exists. To calculate it, one has first tocalculate theaverage distance between any two given

nodes and then average all these distances over thewhole network. We denote this measure bya. Thesecond measure is calledmean path. First, one has tofind theshortest path between any two given nodes,if there is any. Then the shortest paths are averagedover the entire network. We denote this measure bym.

The second research question was to discoverwhether the number of intermediate ideas betweenany two given ones depends on the way the stimuliare presented to the subject. In table 3 we list thepath length characteristics for the two experiments.Taking into consideration the standard deviations ofthese measures, one notices that the path lengths arestatistically the same, irrespective of the method usedfor hint giving.

Experiment a σa m σm

Visual experiment 13.3 1.9 11.5 3.5Verbal experiment 13.9 1.7 12.6 4.2

Table 3: Summary of path length measures.

However, the ideas which act as hubs in the twocases are different, as one may see by comparing ta-bles 1 and 2. This strongly indicates that the ideas areorganised in two different networks, one verbal andone visual, which, however, exhibit similar topolo-gies.

To further examine this topology we use the degreedensityP (k), as this is a measure that characterises anetwork independently from the number of its nodes.We test here for the following null hypotheses:

• H1: The data have been drawn from a popula-tion with Poisson distributionP1(k).

• H2: The data have been drawn from a popula-tion with power law distributionP2(k).

Our analysis showed that we may reject theH1 nullhypothesis at95% confidence level of rejection, whilehypothesisH2 is compatible with our data.

5 Conclusions

The results show:

• a small world behaviour of the networks ob-tained both by visual and verbal cues, indicatedby the low mean path and low clustering coeffi-cient values;

• a correspondence between the visual and verbalnetwork in the value of the mean path and thevalue of the possible number of hubs;

• a correspondence in the topology of the net-works, the two networks being statisticallyequivalent in topology;

• a difference between the visual and verbal net-works indicated by the concepts that acted ashubs in the two networks;

• evidence that the networks are organised asscale-free ones.

If the mind organises itself in the form of networks,it appears that these networks are strongly influencedby the hardware on which the mind resides, i.e. thebrain, otherwise we should not have a different net-work for the flow of ideas when visual stimuli areused, from that created when verbal stimuli are used.It is possible, therefore, to hypothesise the existenceof a hierarchy of organisation of information whichflows from the lowest level of electro-chemical infor-mation of the neurons, to the highest conceptual ab-stractions of ideas. The hierarchy is supported by thesame underling network framework, which is one ofscale-free topology. The network represents the hard-ware of the organisation, dictating which dynamicalmodification is plausible in the evolution of the infor-mation. What we should look for, in order to create alink between top-down and bottom-up approaches, isa mapping between neural structures and their higherlevel correlates.

References

I. Aleksander, The World in my Mind, My Mind inthe World: Key Mechanisms of Consciousness inpeople, Animals and Machines, Academic Press,2005.

L. Barabasi,Linked: The New Science of Networks,Perseus Publishing, 2002.

M. Gladwell, The tipping point. How little things canmake a Big difference., Little, Brown and Com-pany, 2000.

International Human Genome Sequencing Consor-tium, “Initial sequencing and analysis of the hu-man genome,”Nature, vol. 409, no. 6822, pp. 860–922, 2003.

S. Koslow and S. Hyman, “Human brain project: Aprogram for the new millennium,”Einstein Quar-terly Journal of Biology and Medicine, vol. 17, pp.7:15, 2000.

D.R.J. Macer, “The next challenge is to map the hu-man mind,”Nature, vol. 402, pp. 121, 2002.

Social Learning and the Brain

Mark A. Wood?

?Artificial models of natural Intelligence (AmonI)University of Bath, Department of Computer Science

Bath, BA2 7AY, [email protected]

Abstract

Social learning is an important source of human knowledge, and the degree to which we do it setsus apart from other animals. In this short paper, I examine the role of social learning as part of acomplete agent, identify what makes it possible and what additional functionality is needed. I do thiswith reference to COIL, a working model of imitation learning.

1 Building a Brain

The problem of building a brain is one facing meat this very juncture in my research. I need a braincapable of controlling indefinitely a complete agentfunctioning in the virtual world ofUnreal Tourna-ment(UT) (Digital Extremes, 1999). As a game do-main, clearly UT is not an exact replica of the realworld, and much is simplified or omitted altogether.However, it does provide an opportunity to study avery broad range of human behavioural problems ata tractable level of complexity, as opposed to othermore realistic platforms which allow only the studyof narrow classes of problems.

My research thus far has chiefly been in the areaof social learning (particularly imitation), as I be-lieve this is key to survival in a world where thereare unfortunate consequences if things are not learnedquickly enough. We humans also dedicate a vastamount of brain space to learning, and social learn-ing in particular, compared to other species. In thefollowing section, I will explain what I think therole of social learning is and why it is important.I will then briefly overview COIL, a model extend-ing CELL (Roy and Pentland, 2002) from languagelearning to social learning in general. I describe bothwhat COIL requires to function and how it would beextended and complemented to form a complete brainsystem. I conclude with a discussion.

2 The Role of Social Learning

Human infants seem to be innately programmed toimitate from birth (Meltzoff and Moore, 1983). Manyanimals, particularly the great apes, benefit from sim-

ilar kinds of social learning mechanisms (Byrne andRusson, 1998), but none to the extent that we do.The speed and accuracy with which we can assim-ilate goal-directed (ie. task-related) behaviour fromothers is unique. Of course, communicating via lan-guage and the ability to reproduce actions at fine tem-poral granularity are among the human-specific skillswhich facilitate this learning. Taking these things intoconsideration, it would be wise to consider includingsocial learning capabilities in any system designed tofunction as a complete brain.

Furthermore, autonomous agents need skills:whether ‘basic’, low-level skills such as co-ordinatingmotor control, or ‘complex’, high-level skills such asnavigation. To acquire task-related skills at any level,I believe there are at least four types of things whichneed to be learned (Bryson and Wood, 2004, see alsoFigure 1):

1. perceptual classes:What contexts are relevantto selecting appropriate actions.

2. salient actions:What sort of actions are likelyto solve a problem.

3. perception/action pairings:Which actions areappropriate in which salient contexts.

4. ordering of pairings: It is possible that morethan one salient perceptual class is present at thesame time. In this case, an agent needs to knowwhich one is most important to attend to in orderto select the next appropriate action.

Some of these may be innate, but those which arenot must be acquired using a combination of indi-vidual and social learning. For example, assume we

salient percept. classes

salient

acts

ordering of percept. classes / pairs

weights for percept-action pairs

Figure 1: Task learning requires learning four typesof things: relevant categories of actions, relevant cat-egories of perceptual contexts, associations betweenthese, and a prioritized ordering of the pairings. As-suming there is no more than one action per percep-tual class, ordering the perceptual classes is sufficientto order the pairs.

have an agent which can issue motor commands, butdoes not initially know the results these commandswill have on its effectors. Using visual and propri-oceptive sensors (say) to measure these effects, andtrial-and-error (individual) learning, a mapping be-tween commands issued and effects produced can becreated. This example is deliberately analagous tohuman infant ‘body babbling’ (Meltzoff and Moore,1997). However, assuming a reasonable number of‘primitive’ actions can be learned this way, the setof skills that can be built from these blocks is ex-ponentially larger (and so on as more skills are ac-quired). To attempt to learn all skills through trial-and-error, then, would be to search randomly throughthese huge, unconstrained skill spaces — very ineffi-cient.

Social learning can take many forms dependingupon the nature of the agents in question: writtenor verbal instruction, explicit demonstration, implicitimitation, etc. An agent which is part of a soci-ety which facilitates such learning can take advan-tage of the knowledge acquired by previous gener-ations. To do this an agent must be able to relatewhat it perceives to the actions it can execute; it mustsolve a correspondence problem between the instruc-tion or demonstration and it’s own embodiment (Ne-haniv and Dautenhahn, 2002). For a learning agent

in a society of conspecifics, this mapping is simple(although not trivial to learn), but for, say, a robot liv-ing among humans, solving this problem amounts toyet another skill that needs to be mastered. Socially-acquired skill-related knowledge can be used to sig-nificantly reduce the skill search space, allowing indi-vidual learning to merely ‘fine-tune’ new skills, tak-ing into account individual variability within a soci-ety. The other alternative is that the ‘instructions’ ac-quired are coarse-grained enough to perfectly matchexisting segments of behaviour in the learner’s reper-toire.

3 Necessary Components

To better understand the components required for so-cial learning in general, it makes sense to examine theinformation requirements of a model which is capa-ble of such learning. The Cross-Channel Observationand Imitation Learning or COIL model of Wood andBryson (2005) is suitable. This system is designed toobserve via virtual sensors a conspecific agent execut-ing a task, then in real-time output a self-executablerepresentation of the behaviour needed to completethat task. It achieves this by matching the observedactions of this taskexpertwith its observed percep-tions of the environment. I now briefly explain themodel and identify in general terms what is needed ateach stage of processing (see also Figure 2).

Raw Sensor Data

Feature Extraction

P−channelsA−channels

Co−occurence Filtering

Event Segmentation

Recurrence Filtering

Mutual Information Filtering

P−eventsA−events

AP−events

M−E candidates

M−E items

Figure 2: An overview of COIL.

Feature Extraction The inputs to this stage are rawsensory data. Depending upon the task, some of

these data are discarded and others are recodedor categorised. Remaining data are diverted intodifferent channels ready for further processing,some specialising in action recognition and oth-ers in environmental parsing (perception). Thisstage therefore suggests a need for selectiveat-tention, compressedrepresentationsandmod-ularity of processing.

Event SegmentationFollowing Feature Extraction,the channels containing the data are segmentedinto action and perception events dependingupon the channel type. Events define high-levelcoarse-grained actions and perceptual classes,and are further divided into lower-level fine-grained subevents. This segmentation requiresvarioustriggers which are innate in the case ofCOIL, but could theoretically be learned.

Co-occurence Filtering Action and perceptionevents which overlap in time are paired togetherand stored in a buffer (called Short Term Mem-ory or STM). This requirestemporal reasoningandmemory.

Recurrence Filtering Co-occuring action and per-ception subevents which are repeated within thebrief temporal window of STM are tagged. Achunk called a Motivation-Expectation or M-ECandidate, which represents the set of taggedpairs, is created and placed in Mid-Term Mem-ory (MTM). Here we additionally usestatisticalreasoningand abstract judgments ofsimilarity .

Mutual Information Filtering For each M-E Can-didate, the maximum mutual information be-tween its component action and perceptionsubevents is calculated. Those which exceedsome threshold are stored as M-E Items inLong Term Memory (LTM). COIL currentlyuses fixedthresholds, but again they could beacquired through experience. The LTM is theoutput of the system.

The innate skills which are necessary for sociallearning identified above can be provided by the hard-ware (memory, clock, etc.) and software (statisticalalgorithms, similarity metrics, etc.) of the agent.

4 Scaffolding COIL

I have looked at the basic components COIL needs inorder to function as a social learning system. How-ever, the extent of COIL’s role within a complete

agent, and the extra pieces which need to be added,remain in question.

There are a number of problems in assuming thata single monolithic COIL system can alone act as the‘brain’ of our agent. Firstly the algorithm only learns– it has no capacity for making decisions or actingbased upon what it has learned. Our most recentwork demonstrates the addition of an extra modulefor exactly those purposes (Wood and Bryson, 2005).Secondly, a flat COIL system expected to carry outthe high-level task oflife would have to monitor ev-ery action and perception channel that could possiblybe useful in achieving this task, or any of its sub-tasks. Even with the innate attentional capabilitiesof COIL’s Feature Extraction stage, the algorithm’scomplexity is still exponential in the number of chan-nels. Therefore, COIL seems more suited to learninglocal specialised tasks where the number of channelswhich need to be monitored can be reasonably con-strained.

Let us assume instead that we have a number ofCOIL systems, each observing a localised task andits associated action / perception channels. We wouldneed a method for discerning which of the followingfour scenarios is occuring:

1. A known task is being observed.

2. A known task is present1.

3. An unknown task is being observed.

4. No known tasks are present or being observed.

It may be that scenarios 1 and 2 occur concur-rently, in which case a decision would need to bemade whether to observe and learn or join in withthe execution of the task. Also, the presence of a taskdoes not necessarily imply that the task should be ex-ecuted. In scenario 4, social learning is impossible,and the product of previous social learning episodesis not applicable. This is where an individual learningmodule would come into play (see also Section 5).

A complete system that is capable of this arbi-tration is as yet unrealised. It may be possible tocreate a ‘master’ version of COIL which has high-level perception channels monitoring those environ-mental states which differentiate between local tasks,and action channels which cause lower-level task-specific COILs to be activated. On the other hand,a totally different system designed specifically to co-ordinate COILs (for social learning), RL (for inde-pendent learning) and acting may be more appropri-ate.

1A task is present if it is available in the environment for exe-cution by the observer.

5 Discussion

In this final section, I highlight a number of researchproblems, some of which I will be investigating inrelation to the COIL project, but all of which I be-lieve will need to be studied before a complete work-ing brain becomes a possibility. The balance betweenwhat is innate and what is learned, for both biologicaland theoretical robotic examples, has been discussedby Sloman and Chappell (2005). They claim that us-ing a hybrid of the two may prove to be better than us-ing either in isolation. We can further subdivide thatwhich is learned into that which is learned socially,and that which is learned independently. Similarly,the best technique is probably to combine the two,and it is a thorough study of the balance and appli-cation of both that may result in significant progresstoward constructing a complete agent. This task hasat least the following component questions:

• What structures must be present to make inde-pendent learning possible? Presumably many ofthese will be innate, but how can social learn-ing improve these structures / primitives and / orthe efficiency of their usage (ie. learning how tolearn better from another)?

• What primitives are needed to make social learn-ing a possibility? Must they be acquired throughtrial-and-error learning, or can they be innate?If acquired, what is the cost of such acquisition?How do they differ from those required for indi-vidual learning?

• How does the embodiment of a given agent af-fect the structures / primitives best suited forboth individual and social learning (bothwhat islearned and thewayit is learned)? How does thiscompare / interact with the affect the requiredtasks have on these primitives?

• How can individual and social learning be com-bined at both the practical task level and the ab-stract memory level? Do different combinationstrategies result in different levels of efficiencyand / or goal accomplishment? Is this ‘meta-skill’ of hybrid learning itself innate, or some-how learned?

• What is the optimal trade-off between individ-ual and social learning for a given task? Howdoes this change with increasing task complex-ity? How is this affected by the nature of the taskrelative to, say, the embodiment of the executingagent?

• How can knowledge be consolidated to improvelearning (of both kinds) next time? How are con-flicts between what is learned socially and whatis learned independently resolved? How easilyapplicable are social skills and their associatedknowledge to individual learning situations, andvice versa?

I hope that these proposed research areas, and thispaper as a whole, will in some way stimulate othersinto thinking about social learning in the context of acomplete agent system.

References

Joanna J. Bryson and Mark A. Wood. Learning dis-cretely: Behaviour and organisation in social learn-ing. In Proceedings of the Third InternationalSymposium on Imitation in Animals and Artifacts,pages 30–37, 2004.

Richard W. Byrne and Anne E. Russon. Learning byimitation: a hierarchical approach.Behavioral andBrain Sciences, 16(3), 1998.

Digital Extremes. Unreal Tournament, 1999. EpicGames, Inc.

Andrew N. Meltzoff and M. Keith Moore. Newborninfants imitate adult facial gestures.Child Devel-opment, 54:702–709, 1983.

Andrew N. Meltzoff and M. Keith Moore. Explainingfacial imitation: A theoretical model.Early Devel-opment and Parenting, 6:179–192, 1997.

Chrystopher L. Nehaniv and Kerstin Dautenhahn.The correspondence problem. In Kerstin Dauten-hahn and Chrystopher L. Nehaniv, editors,Imita-tion in Animals and Artifacts, Complex AdaptiveSystems, chapter 2, pages 41–61. The MIT Press,2002.

Deb K. Roy and Alex P. Pentland. Learning wordsfrom sights and sounds: a computational model.Cognitive Science, 26:113–146, 2002.

Aaron Sloman and Jackie Chappell. The altricial-precocial spectrum for robots. InProceedings IJ-CAI’05, pages 1187–1192, Edinburgh, 2005.

Mark A. Wood and Joanna J. Bryson. Skill acquisi-tion through program-level imitation in a real-timedomain. Technical Report CSBU-2005-16, Uni-versity of Bath Department of Computer Science,2005. Submitted to IEEE Systems, Man and Cy-bernetics – Part B.

emotion as an integrative process between non-symbolic and …€¦ · emotion as an integrative...

Documents