articles the st. thomas common sense symposiumweb.mit.edu/aisha/public/common_sense.pdf ·...

■ To build a machine that has “common sense” wasonce a principal goal in the field of artificial intel-ligence. But most researchers in recent years haveretreated from that ambitious aim. Instead, eachdeveloped some special technique that could dealwith some class of problem well, but does poorly atalmost everything else. We are convinced, howev-er, that no one such method will ever turn out tobe “best,” and that instead, the powerful AI sys-tems of the future will use a diverse array of re-sources that, together, will deal with a great rangeof problems. To build a machine that’s resourcefulenough to have humanlike common sense, wemust develop ways to combine the advantages ofmultiple methods to represent knowledge, multi-ple ways to make inferences, and multiple ways tolearn. We held a two-day symposium in St.Thomas, U.S. Virgin Islands, to discuss such a pro-ject—to develop new architectural schemes thatcan bridge between different strategies and repre-sentations. This article reports on the events andideas developed at this meeting and subsequentthoughts by the authors on how to make progress.

The Need for Synthesis in Modern AI

To build a machine that has “commonsense” was once a principal goal in thefield of artificial intelligence. But most re-

searchers in recent years have retreated fromthat ambitious aim. Instead, each developed

some special technique that could deal withsome class of problem well, but does poorly atalmost everything else. An outsider might re-gard our field as a chaotic array of attempts toexploit the advantages of (for example) neuralnetworks, formal logic, genetic programming,or statistical inference—with the proponentsof each method maintaining that their chosentechnique will someday replace most of theother competitors.

We do not mean to dismiss any particulartechnique. However, we are convinced that noone such method will ever turn out to be“best,” and that instead, the powerful AI sys-tems of the future will use a diverse array of re-sources that, together, will deal with a greatrange of problems. In other words, we shouldnot seek a single “unified theory!” To build amachine that is resourceful enough to have hu-manlike common sense, we must develop waysto combine the advantages of multiple meth-ods to represent knowledge, multiple ways tomake inferences, and multiple ways to learn.

We held a two-day symposium in St.Thomas, U.S. Virgin Islands, to discuss such aproject—to develop new architectural schemesthat can bridge between different strategiesand representations. This article reports on theevents and ideas developed at this meeting andsubsequent thoughts by the authors on how tomake progress.1

Articles

SUMMER 2004 113

The St. Thomas Common Sense

Symposium:Designing Architectures

for Human-Level Intelligence

Marvin Minsky, Push Singh, and Aaron Sloman

Copyright © 2004, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2004 / $2.00

mining whether a region in an image is a patchof skin or a fragment of a cloud. This can bedone by summing the contributions of manysmall pieces of evidence such as the individualpixels of the texture. No one pixel is terriblyimportant, but en masse they determine theclassification. Formal logic, on the other hand,works well on problems where there are rela-tively few causal components, but which arearranged in intricate structures sensitive to theslightest disturbance or inconsistency. An ex-ample of such a problem-type is verifying thecorrectness of a computer program, whose be-havior can be changed completely by modify-ing a single bit of its code. Case-based and ana-logical reasoning lie between these extremes,matched to problems where there are a moder-ate number of causal components each with amodest amount of influence. Many commonsense domains, such as human social reason-ing, may fall into this category. Such problemsmay involve knowledge too difficult to formal-ize as a small set of logical axioms, or too diffi-cult to acquire enough data about to train anadequate statistical model.

It is true that many of these techniques haveworked well outside of the regimes suggestedby this causal diversity matrix. For example,statistical methods have found application inrealms where previously rule-based methodswere the norm, such as in the syntactic parsingof natural language text. However, we need aricher heuristic theory of when to apply differ-ent AI techniques, and this causal diversity ma-trix could be an initial step toward that. Weneed to further develop and extend such theo-ries to include the entire range of AI methodsthat have been developed, so that we can moresystematically exploit the advantages of partic-ular techniques.

How could such a “meta-theory of AI tech-niques” be used by an AI architecture? Beforewe turned to this question, we discussed a con-crete problem domain in which we could thinkmore clearly about the goal of building a ma-chine with common sense.

Returning to the Blocks WorldLater that first morning, Push Singh presenteda possible target domain for a commonsensearchitecture project. Consider the situation oftwo children playing together with blocks (fig-ure 2).

Even in this simple situation, the childrenmay have concerns that span many “mentalrealms”:

Physical: What if I pulled out that bottomblock?

Organizing the Diversity of AI Methods

Marvin Minsky kicked off the meeting by dis-cussing how we might begin to organize themany techniques that have been developed inAI so far. While AI researchers have inventedmany representations, methods, and architec-tures for solving many types of problems, theystill have little understanding of the strengthsand weaknesses of each these techniques. Weneed a theory that helps to map the types ofproblems we face onto the types of solutionsthat are available to us. When should one usea neural network? When should one use statis-tical learning? When should one use logicaltheorem proving?

To help answer these kinds of questions,Minsky suggested that we could organize dif-ferent AI methods into a “causal diversity ma-trix” (figure 1). Here, each problem-solvingmethod, such as analogical reasoning, logicaltheorem proving, and statistical inference, isassessed in terms of its competence at dealingwith problem domains with different causalstructures.

Statistical inference is often useful for situa-tions that are affected by many differentmatched causal components, but where eachcontributes only slightly to the final phenom-enon. A good example of such a problem-typeis visual texture classification, such as deter-

Articles

114 AI MAGAZINE

Few Numbers of Causes Many

Small

Scaleof

Effect

Large

Easy

OrdinaryQualitativeReasoning

SymbolicLogicalReasoning

Linear,Statistical

ConnectionistNeural NetworkFuzzy Logic

Intractable

Find BetterRepresentation!

TraditionalComputing

ClassicalAI

Societyof Mind

or

Case BasedReasoning

Analogy BasedReasoning

Figure 1. The Causal Diversity Matrix.Each common AI technique is matched to problem-types with particular causalstructures. Note that this is just a first draft of one idea about describing what eachAI technique can do—each reader may have a different view about what theirown favorite method can do, and whether these “axes” make sense for them. (Di-agram adapted from Minsky’s Future of AI Technology.)

Bodily: Can I reach that green block from here?Social: Should I help him with his tower orknock it down?Psychological: I forgot where I left the blueblock.Visual: Is the blue block hidden behind thatstack?Spatial: Can I arrange those blocks into theshape of a table?Tactile: What would it feel like to grab fiveblocks at once?Self-Reflective: I’m getting bored with this—what else is there to do?

Singh argued that no present-day AI systemdemonstrates such a broad range of common-sense skills. Any architecture we design shouldaim to achieve some competence within eachof these and other important mental realms.He proposed that to do this we work within thesimplest possible domain requiring reasoningin each of these realms. He suggested that wedevelop our architectures within a physicallyrealistic model world resembling the classicBlocks World, but where the world was popu-lated by several simulated beings, and thus em-phasizing social problems in addition to phys-ical ones. These beings would manipulatesimple objects like blocks, balls, and cylinders,and would participate in the kinds of scenariosdepicted in figure 3, which include jointlybuilding structures of various kinds, competingto solve puzzles, teaching each other skillsthrough examples and through conversation,and verbally reflecting on their own successesand failures.

The apparent simplicity of this world is de-ceptive, for many of the kinds of problems thatshow up in this world have not yet been tack-led in AI, for they require combining elementsof the following:

Spatial reasoning about the spatial arrange-ments of objects in one’s environment andhow the parts of objects are oriented and situ-ated in relation to one another. (Which ofthose blocks is closest to me?)

Physical reasoning about the dynamic behav-ior of physical objects with masses and collid-ing/supporting surfaces. (What would happenif I removed that middle block from the tow-er?)

Bodily reasoning about the capabilities ofone’s physical body. (Can I reach that blockwithout having to get up?)

Visual reasoning about the world that under-lies what can be seen. (Is that a cylinder-shapedblock or part of a person’s leg?)

Psychological reasoning about the goals andbeliefs oneself and of others. (What is the otherperson trying to do?)

Social reasoning about the relationships,

shared goals and histories that exist betweenpeople. (How can I accomplish my goal with-out the other person interfering?)

Reflective reasoning about one’s own recentdeliberations. (What was I trying to do a mo-ment ago?)

Conversational reasoning about how to ex-press one’s ideas to others. (How can I explainmy problem to the other person?)

Educational reasoning about how to best learnabout some subject, or to teach it to someoneelse. (How can I generalize useful rules aboutthe world from experiences?)

Many of the meeting participants were en-thusiastic about this proposal and agreed thatthere would be challenging visual, spatial, androbotics problems within this domain. KenForbus pointed out that the video game com-munities would soon produce programmablevirtual worlds that would easily meet ourneeds. Several participants mentioned the suc-cess of the RoboCup competitions (Kitano etal. 1997), but some concluded that theRoboCup domain, while appropriate for thoseinterested in the problem of coordinating mul-tiagent teams in a competitive scenario, wasvery different in character from the situation oftwo or three people more slowly working to-gether on a physical task, communicating innatural language, and in general operating ona more thoughtful and reflective level.

Articles

SUMMER 2004 115

Figure 2. A Pair of Busy Youths.

Establishing a Collection of Graded Miniscenarios

How would we guide such aproject and measure itsprogress over time? Some par-

ticipants suggested trying to emulatethe abilities of human children atvarious ages. However, others arguedthat while this should inspire us, weshould not use it as a plan for the pro-ject, because we don’t really yet knowenough about the details of early hu-man mental development.

Aaron Sloman argued that it mightbe better to try to model the mind ofa four- or five-year-old human childbecause that might lead more directlytoward more substantial adult abili-ties. After the meeting, Sloman devel-oped the notion of a “commonsenseminiscenario,” a concrete descriptionin the form of a simple storyboard ofa particular skill that a commonsensearchitecture should be able todemonstrate. Each miniscenario hasseveral features: (1) It describes someforms of competence, which are ro-bust insofar as they can cope withwide ranges of variation in the condi-tions; and (2) each comes with somemeta-competence for thinking andspeaking about what was done. Forexample competence can have anumber of different facets, includingdescribing the process; explainingwhy something was done, or whysomething else would not haveworked; being able to answer hypo-thetical questions about what wouldhappen otherwise; being able to im-prove performance in such ways asimproving fluency, removing bugs instrategies, and expanding the varietyof contexts. The system should alsobe able to further justify these kindsof remarks.

Sloman proposed this example of asequence of increasingly sophisticat-ed such miniscenarios in the pro-posed multi-robot problem domain:

1. Person wants to get box from high

shelf. Ladder is in place. Personclimbs ladder, picks up box, andclimbs down.

2. As for 1, except that the personclimbs ladder, finds he can’t reachthe box because it’s too far to oneside, so he climbs down, moves theladder sideways, then as 1.

3. As for 1, except that the ladder islying on the floor at the far end ofthe room. He drags it across theroom lifts it against the wall, thenas 1.

4. As for 1, except that if asked whileclimbing the ladder why he isclimbing it the person answers:something like “To get the box.” Itshould understand why “To get tothe top of the ladder” or “To in-crease my height above the floor”would be inappropriate, albeit cor-rect.

5. As for 2 and 3, except that whenasked, “Why are you moving theladder?” the person gives a sensiblereply. This can depend in complexways on the previous contexts, aswhen there is already a ladder clos-er to the box, but which looks un-safe or has just been painted. Ifasked, “would it be safe to climb ifthe foot of the ladder is right upagainst the wall?” the person canreply with an answer that shows anunderstanding of the physics andgeometry of the situation.

6. The ladder is not long enough toreach the shelf if put against thewall at a safe angle for climbing.Another person suggests movingthe bottom closer to the wall, andoffers to hold the bottom of theladder to make it safe. If asked whyholding it will make it safe, gives asensible answer about preventingrotation of ladder.

7. There is no ladder, but there arewooden rungs, and rails with holes

from which a ladder can be con-structed. The person makes a lad-der and then acts as in previousscenarios. (This needs further un-packing, e.g. regarding sensible se-quences of actions, things that cango wrong during the construction,and how to recover from them,etc.)

8. As for 7, but the rungs fit onlyloosely into the holes in the rails.Person assembles the ladder but re-fuses to climb up it, and if askedwhy can explain why it is unsafe.

9. Person watching another who isabout to climb up the ladder withloose rungs should be able to ex-plain that a calamity could result,that the other might be hurt, andthat people don’t like being hurt.

Such a system should be made toface a substantial library of such grad-ed sequences of mini-scenarios thatrequire it both to learn new skills, toimprove its abilities to reflect onthem, and (with practice) to becomemuch more fluent and quick atachieving these tasks. Theseorderings should be based on suchfactors as the required complexity ofobjects, processes, and knowledge in-volved, the linguistic competence re-quired, and the understanding ofhow others think and feel. That li-brary could include all sorts of thingschildren learn to do in such variouscontexts as dressing and undressingdolls, coloring in a picture book, tak-ing a bath (or washing a dog), makingtoys out of Meccano and other con-struction kits, eating a meal, feeding ababy, cleaning a mess made byspilling some powder or liquid, read-ing a story and answering questionsabout it, making up stories, dis-cussing behavior of a naughty per-son, and learning to think and talkabout the past, the future, and aboutdistant places, etc.

Articles

116 AI MAGAZINE

Still, the participants had a heated debateabout the adequacy of the proposed problemdomain. The most common criticism was thatthis world does not contain enough of a varietyof objects or richness of behavior. Doug Lenatsuggested a solution to this, which was to em-bed the people within not a Blocks World, butinstead somewhere like a typical house or of-fice, as in the popular computer game The Sims.Doug Riecken argued that we could developenough of the architecture within the morelimited virtual world, and later add extensionsto deal with a wider range of objects and phe-nomena.

A different response to this criticism wasthat in order to focus on architectural issues, itwould help to simplify the problem domain, sothat we could focus less on acquiring a largemass of world knowledge, and more on devel-oping better ways for systems to use the knowl-edge they have. However, other participants ar-gued that restricting the world would notentirely bypass the need for large databases ofcommonsense knowledge, for even this simpleworld would likely require hundreds of thou-sands or even millions of elementary pieces of

commonsense knowledge about space, time,physics, bodies, social interactions, object ap-pearances, and so forth.

Other participants disagreed with the virtualworld domain. They felt that we should in-stead take the more practical approach of de-veloping the architecture by starting with auseful application like a search engine or con-versational agent, and extending its commonsense abilities over time. But Ben Kuipers wor-ried that choosing too specific an applicationwould lead to what happened to most previousprojects—someone discovers some set of adhoc tricks that leads to adequate performance,without making any more general progress to-ward more versatile, resourceful, or “more in-telligent” systems.

In the end, after long debates we achieved asubstantial consensus that to solve harderproblems requiring common sense, we firstneeded to solve the more restricted class ofproblems that show up in simpler domains likethe proposed virtual world. Once we get thecore of the architecture functioning in this richbut limited domain, we can attempt to extendit—or it extend itself—to deal with a broader

Articles

SUMMER 2004 117

“I see you’re trying tobuild a tower”

Recognizes the purposebehind the actions of theother agent.

Involves some spatial,physical, visual, andpsychological reasoning.

“Yes but I can’t reachthat block”

Notices an impasse orproblem that it cannotsolve.

Involves knowledge aboutspace, bodies and theirabilities, problem solving,and reflection.

“I can reach it, let meget it for you”

Realizes that it canachieve that goal, and itdoes not conflict with itsown goals.

Involves social reasoningand cooperative problemsolving.

Figure 3. Reasoning in Multiple Mental Realms to Solve a Problem in the Model World.

several reflective levels beyond the reactive anddeliberative levels. Here is one view of his mod-el for the architecture of a person’s mind, as de-scribed in his book, The Emotion Machine, andshown here in figure 4.

Some participants questioned the need forso many reflective layers; would not a singleone be enough? Minsky responded by arguingthat today, when our theories still explain toolittle, we should elaborate rather than simplify,and we should be building theories with moreparts, not fewer. This general philosophypervades his architectural design, with itsmany layers, representations, critics, reasoningmethods, and other diverse types of compo-nents. Only once we have built an architecturerich enough to explain most of what peoplecan do will it make sense to try to simplifythings. But today, we are still far from an archi-tectural design that explains even a tiny frac-tion of human cognition.

Aaron Sloman’s Cognition and Affect projecthas explored a space of architectures proposedas models for human minds; a sketch of Slo-man’s H-CogAff model is shown in figure 5.This architecture appears to provide a frame-work for defining with greater precision thanpreviously a host of mental concepts, includ-ing affective concepts, such as “emotion,” “at-titude,” “mood,” “pleasure,” and so on. For in-stance, H-CogAff allows us to define at leastthree distinct varieties of emotions; primary,secondary and tertiary emotions, involving dif-ferent layers of the architecture which evolvedat different times—and the same architecturecan also distinguish different forms of learn-ing, perception, and control of behavior. (A dif-ferent architecture might be better for explor-ing analogous states of insects, reptiles, orother mammals.) Human infants probablyhave a much-reduced version of the architec-ture that includes self-bootstrapping mecha-nisms that lead to the adult form.

The central idea behind the Minsky-Slomanarchitectures is that the source of human re-sourcefulness and robustness is the diversity ofour cognitive processes: we have many ways tosolve every kind of problem—both in theworld and in the mind—so that when we getstuck using one method of solution, we canrapidly switch to another. There is no singleunderlying knowledge representation schemeor inferencing mechanism.

How do such architectures support such di-versity? In the case of Minsky’s Emotion Ma-chine architecture, the top level is organized asfollows. When the system encounters a prob-lem, it first uses some knowledge about “prob-lem-types” to select some “way-to-think” that

range of problems using a much broader arrayof commonsense knowledge.

Large-Scale Architectures for Human-level Intelligence

In the afternoon, we discussed large-scale ar-chitectures for machines with human-level in-telligence and common sense. Marvin Minskyand Aaron Sloman each presented their cur-rent architectural proposals as a starting pointfor the meeting participants to criticize, debug,and elaborate. These two architectures share somany features that we will refer to them to-gether as the Minsky-Sloman model.

These architectures are distinguished bytheir emphasis on reflective thinking. Mostcognitive models have focused only on ways toreact or deliberate. However, to make machinesmore versatile, they will need better ways torecognize and repair the obstacles, bugs anddeficiencies that result from their own activi-ties. In particular, whenever one strategy fails,they’ll need to have a collection of ways toswitch to alternative ways to think. To providefor this, Minsky’s architectural design includes

Articles

118 AI MAGAZINE

Figure 4. Minsky’s Emotion Machine Architecture.

ReactiveReflexive, scripted, and otherwise automatic, non-deliberative processes acting both on the external worldand within the mind itself.

DeliberativeReasons about the situations and events in the externalworld, e.g. prediction, explanation, planning, diagnosis,generalization.

ReflectiveReflects on and manages deliberative activity, includingassigning credit to inference methods, selecting suitablerepresentations, and so forth.

Self-ReflectiveConcerned with larger scale models of “self,” including theextent and boundaries of one’s physical and cognitiveabilities and knowledge.

Self-ConsciousConcerned with relationship between this mind andothers, including self-appraisal by comparing one’sactivities and abilities with those of others.

Self-IdealsConcerned with assessing one’s activities with respect tothe “ideals” established via interactions with one’s rolemodels.

Effectors

Sensors

might work. Minsky describes “ways-to-think”as configurations of agents within the mindthat dispose it towards using certain styles ofrepresentation, collections of commonsenseknowledge, strategies for reasoning, types ofgoals and preferences, memories of past expe-riences, manners of reflections, and all the oth-er aspects that go into a particular “cognitivestyle.” One source of knowledge relating prob-lem-types to ways-to-think is the causal diver-sity matrix discussed at the start of the meet-ing—for example, if the system were presentedwith a social problem, it might use the causaldiversity matrix to then select a case-based

style of reasoning, and a particular database ofsocial reasoning episodes to use with it.

However, any particular such approach islikely to fail in various ways. Then if certain“critic” agents notice specific ways in whichthat approach has failed, they either suggeststrategies to adapt that approach, or suggestalternative ways-to-think, as suggested shownin figure 6. This is not done by employing anysimple strategy for reflection and repair, butrather by using large arrays of higher levelknowledge about where each way-to-think hasadvantages and disadvantages, and how toadapt them to new contexts.

Articles

SUMMER 2004 119

META-MANAGEMENT(reflective) processes

THE ENVIRONMENT

Motiveactivation

Longtermassociativememory

ALARMS

Variablethresholdattentionfilters

Personaeactionhierarchy

perceptionhierarchy

REACTIVE PROCESSES

DELIBERATIVE PROCESSES

(Planning, deciding,“What if” reasoning)

Figure 5. Aaron Sloman’s H-CogAff Architecture.

Ashwin Ram and Larry Birnbaum both pointedout that despite the agreement over the archi-tectural proposals it was still not clear what theparticular components of the architecturewould be. They pointed out that we needed tothink more about what the units of reasoningwould be. In other words, we needed to comeup with a good list of way-to-think. Some ex-amples might include the following:

Solving problems by making analogies to pastexperiences

Predicting what will happen next by rule-basedmental simulations

Constructing new “ways to think” by buildingnew collections of agents

Explaining unexpected events by diagnosingcausal graphs

Learning from problem-solving episodes by de-bugging semantic networks

Inferring the state of other minds by re-usingself-models

Classifying types of situations using statisticalinference

Getting unstuck by reformulating the problemsituation

This list could be extended to include all avail-able AI techniques.

Educating the ArchitectureOn the morning of the second day of the meet-ing, we addressed the problem of how to sup-ply the architecture with a broad range of com-monsense knowledge, so that it would nothave to “start from scratch.” We all agreed thatlearning was of value, but we didn’t all agreeon where to start. Many researchers would liketo start with nothing; however, Aaron Slomanpointed out that an architecture that comeswith no knowledge is like a programming lan-guage that comes with no programs or li-braries.

One view that was expressed was that ap-proaches that start out with too little initialknowledge would likely not achieve enoughversatility in any practical length of time. Min-sky criticized the increasing popularity of theconcept of a “baby machine”—learning sys-tems designed to achieve great competence,given very little initial structure. Some of theseideas include genetic programming, robotsthat learn by associating sensory-motor pat-terns, and online chatbots that try to learn lan-guage by generalizing from thousands of con-versations. Minsky’s complaint was that theproblem is not that the concept of a baby ma-chine is itself unsound, but rather that wedon’t know how to do it yet. Such approacheshave all failed to make much progress because

In Minsky’s design, several ways-to-think areusually active in parallel. This enables the sys-tem to quickly and fluently switch between dif-ferent ways-to-think because, instead of start-ing over at each transition, each newlyactivated way-to-think will find an already-pre-pared representation. The system will rarely“get stuck” because those alternative ways-to-think will be ready to take over when the pre-sent one runs into trouble, as shown in figure 7.

Here each way-to-think involves reasoningin a particular subset of mental realms. Impass-es encountered while reasoning in one set ofmental realms can be overcome within others.Further information about these architecturescan be found in Singh and Minsky (2003), Slo-man (2001), and McCarthy et al. (2002). Min-sky’s model will be described in detail in hisnew book The Emotion Machine (Minsky, forth-coming).

Generally, the participants were sympatheticto these proposals, and all agreed with the ideathat to achieve human-level intelligence weneeded to develop more effective ways to com-bine multiple AI techniques. Ken Forbus sug-gested that we needed a kind of “componentmarketplace,” and that we should find ways toinstrument these components so that the re-flective layers of the architecture had useful in-formation available to them. He contrasted theSoar project (Laird, Newell, and Rosenbloom1987) as an effort to eliminate and unify com-ponents rather than to accumulate and diver-sify them, as in the Minsky-Sloman proposals.

Articles

120 AI MAGAZINE

Reactive

Deliberative

Reflective

Self-Reflective

Self-Conscious

Self-Ideals

Reactive

Deliberative

Reflective

Self-Reflective

Self-Conscious

Self-Ideals

Initial Way-to-Think New Way-to-Think

Figure 6. Switching from a Largely Reactive to a Largely Deliberative Way-to-Think.

Circles represent agents and other mental resources (fragments of knowledge,methods of reasoning, ways to learn, etc.) specific to that way-to-think, spanningthe many levels of the architecture.

they started out with inadequate schemes forlearning new things. You cannot teach algebrato a cat; among other things, human infantsare already equipped with architectural fea-tures to equip them to think about the causesof their successes and failures and then to makeappropriate changes. Today we do not yet haveenough ideas about how to represent, orga-nize, and use much of commonsense knowl-edge, let alone build a machine that couldlearn all of that automatically on its own. AsJohn McCarthy noted long ago: “in order for aprogram to be capable of learning something,it must first be able to represent that knowl-edge.”

There are very few general-purpose com-monsense knowledge resources in the AI com-munity. Doug Lenat gave a wonderful presen-tation of the Cyc system, which is presentlythe project furthest along at developing a use-ful and reusable such resource for the AI com-munity, so that new AI programs don’t have tostart with almost nothing. The Cyc project(Lenat 1995) has developed a great many waysto represent commonsense knowledge, andhas built a database of over a million common-sense facts and rules. However, Lenat estimatedthat an adult-level commonsense systemmight require 100 million units of common-sense knowledge, and so one of their currentdirections is to move to a distributed knowl-edge acquisition approach, where it is hopedthat eventually thousands of volunteer teach-ers around the world will work together teachCyc new commonsense knowledge. Lenatspent some time describing the development

of friendly interfaces to Cyc that allow nonlo-gicians to participate in the complicated teach-ing and debugging processes involved in build-ing up the Cyc knowledge base.

Many of the participants agreed that Cycwould be useful, and some suggested we couldeven base our effort on top of it, but otherswere sharply critical. Jeffrey Siskind doubtedthat Cyc contained the spatial and perceptualknowledge needed to do important kinds of vi-sual scene interpretation. Roger Schank arguedthat Cyc’s axiomatic approach was unsuitablefor making the kinds of generalizations andanalogies that a more case-based and narrative-oriented approach would support. SriniNarayanan worried that the Cyc project wasnot adequately based on what cognitive scien-tists have learned about how people makecommonsense inferences. Oliver Steele con-cluded that while we disagreed about whetherCyc was 90% of the solution or only 10%, thiswas really an empirical question that we wouldanswer during the course of the project. Butgenerally, the architectural proposal was re-garded as complementary to parallel efforts toaccumulate substantial commonsense knowl-edge bases.

Minsky predicted that if we used Cyc, wemight need to augment each existing item ofknowledge with additional kinds of proceduraland heuristic knowledge, such as descriptionsof (1) problems that this knowledge item couldhelp solve; (2) ways of thinking that it couldparticipate in; (3) known arguments for andagainst using it; and (4) ways to adapt it to newcontexts.

Articles

SUMMER 2004 121

Figure 7. Using Several Ways-to-Think in Parallel in a Social Blocks World.

Spatial and bodily reasoning(Can I reach the block?)

Bodily and social reasoning(Could the other person handme the block?)

Social-competitive reasoning(The other person is taking the blockfor themselves!)

Economic reasoning(Trade with them a block Idon’t need)

Grasping ablock Got

block!

Critic anticipates problem or detects fatal impasse

Reasoning within a set of mental realms

Communication between agents across mental realms

that it could avoid the silliest mistakes. (2) Itwould understand human reasoning so that itcould present you with the right level of detailand avoid saying things that you probably in-ferred. (3) It would converse in natural lan-guage so that you could easily talk to it aboutcomplex matters without having to learn a spe-cial language or complex interface.

To build such a kind of “helping machine,”we would first need to give it knowledge aboutspace, time, beliefs, plans, stories, mistakes,successes, relationships, and so forth, as well asgood conversational skills. However, little ofthis could be realized by anything less than asystem with common sense. To accomplishthis we would need to pursue some sequenceof more modest goals that would help one withsimpler problem types—until the systemachieved the sorts of competence that we ex-pect from a typical human four- or five-year-old.

However, to get such a system to work, wewould need to address many presently un-solved commonsense problems that show upin the model-world problem domain.

Final ConsensusThe participants agreed that no single tech-nique (such as statistics, logic, or neural net-works) could cope with a sufficiently widerange of problem-types. To achieve human-lev-el intelligence we must create an architecturethat can support many different ways to repre-sent, acquire, and apply many kinds of com-monsense knowledge.

Most participants agreed that we shouldcombine our efforts to develop a model worldthat supports simplified versions of everydayphysical, social, and psychological problems.This simplified world would then be used todevelop and debug the core components of thearchitecture. Later, we can expand it to solvemore difficult and more practical problems.

The participants did not all agree on whichparticular larger-scale application would bothattract sufficient support and also produce sub-stantial progress toward making machines thatuse commonsense knowledge. Still, manyagreed with the concept of a personalizedteaching machine that would come to under-stand you so well that it could adapt to yourparticular circumstances, difficulties, andneeds.

Ben Kuipers sketched the diagram shown infigure 8, which captures the general dependen-cies between the three points of consensus:Practical applications depend on developingan architecture for commonsense thinking

It was stressed that knowledge about theworld was not enough by itself—we also needa knowledge base about how to reason, reflectand learn, the knowledge that the reflectivelayers of the architecture must possess. Theproblem remains that the programs we havefor using knowledge are not flexible enough,and neither Cyc’s “adult machine” approach ofsupplying a great deal of world knowledge, northe “baby machine” approach of learning com-mon sense from raw sensory-motor experi-ence, will likely succeed without first develop-ing an architecture that supports multiple waysto reason, learn, and reflect upon and improveits activities.

An Important ApplicationSeveral of the participants felt that such a pro-ject would not receive substantial support un-less it proposed an application that clearlywould benefit much of the world. Not just animprovement to something existing, it wouldneed to be one that could not be built withoutbeing capable of human-level commonsensereasoning.

After a good deal of argument, several partic-ipants converged upon a vision from The Dia-mond Age, a novel by Neil Stephenson. Thatnovel envisioned an “intelligent book”—TheYoung Ladies Illustrated Primer—that, when giv-en to a young girl, would immediately bondwith her and come to understand her so well asto become a powerful personal tutor and men-tor.

This suggested that we could try to build apersonalized teaching machine that would adaptitself to someone’s particular circumstances,difficulties, and needs. The system would carryout a conversation with you, to help you un-derstand a problem or achieve some goal. Youcould discuss with it such subjects as how tochoose a house or car, how to learn to play agame or get better at some subject, how to de-cide whether to go to the doctor, and so forth.It would help you by telling you what to read,stepping you through solutions, and teachingyou about the subject in other ways it found tobe effective for you. Textbooks then could bereplaced by systems that know how to explainideas to you in particular, because they wouldknow your background, your skills, and howyou best learn.

This kind of application could form the basisfor a completely new way to interact with com-puters, one that bypasses the complexities andlimitations of current operating systems. Itwould use common sense in many differentways: (1) It would understand human goals so

Articles

122 AI MAGAZINE

flexible enough to integrate a wide array ofprocesses and representations of problems thatcome up in the model-world problem domain.

A Collaborative Project?At the end of the meeting, we brainstormedabout how we might organize a distributed,collaborative project to build an architecturebased on the ideas discussed at this meeting. Itis a difficult challenge, both technically and so-cially, to get a community of researchers towork on a common project. However, success-es in the Open Source community show thatsuch distributed projects are feasible when thecomponents can be reasonably disassociated.

Furthermore, this kind of architecture itselfshould help to make it easy for members of theproject to add new types of representationsand processes. However, we first would have todevelop a set of protocols to support the inter-operation of such a diverse array of methods.Erik Mueller suggested that such an organiza-tion could be modeled after the World WideWeb Consortium (W3C), and its job wouldlargely be to assess, standardize and publishthe protocols and underlying tools that such adistributed effort would demand.

While we did not sketch a detailed plan forhow to proceed, Aaron Sloman, Erik Muellerand Push Singh listed some technical steps thatsuch a project would need:

First, it should not be too hard to develop asuitable virtual model world, because the pre-sent-day video game and computer graphicsindustry has produced most of the requiredcomponents. These should already include ad-equate libraries for computer graphics, physicssimulation, collision detection, and so forth.

Second, we need to develop and order theset of miniscenarios that we will use to orga-nize and evaluate our progress. This would bea continuous process, as new types of problemswill constantly be identified.

Third, what kinds of protocols could theagents of this cognitive system use to coordi-nate with each other? This would include mes-sages for updating representations, describinggoals, identifying impasses, requesting knowl-edge, and so forth. We would consider the rad-ical proposal to use, for this, an Interlinguabased on a simplified form of English, ratherthan trying to develop some brand new ontol-ogy for expressing commonsense ideas. Ofcourse, each individual agent could be free touse internally whatever ontology or represen-tation scheme was most convenient and use-ful.

Fourth, we would need to create a compre-

hensive catalog of ways-to-think, to incorpo-rate into the architecture. A commonsense sys-tem should be at least capable of reasoningabout prediction, explanation, generalization,exemplification, planning, diagnosis, reflec-tion, debugging, learning, and abstracting.

Fifth, what are the kinds of self-reflectionsthat a commonsense system should be able tomake of itself, and how should these invokeand modify ways-to-think as problems are en-countered?

Sixth, in any case, such a system will need asubstantial, general-purpose, and reusablecommonsense knowledge base about the spa-tial, physical, bodily, social, psychological, re-flective, and other important realms, enoughto deal with a broad range of problems withinthe model world problem domain.

Finally, we might need to develop a newkind of “intention-based” programming lan-guage to support the construction of such anarchitecture.

Towards the FutureSince our meeting similar sentiments havebeen expressed at DARPA, most notably in therecent “Cognitive Systems” Information Pro-cessing Technology Office (IPTO) BroadAgency Announcement (BAA) (Brachman andLemnios 2002), which solicits proposals forbuilding AI systems that combine many ele-

Articles

SUMMER 2004 123

Commonsense Architecture

Knowledge basesCyc

Open mindThe web

Reasoning componentsCase-based reasoningStatistical inference

Neural networksFormal logic Architectural elements

Agent communication protocolsWays-to-think

Reflective layersDeliberative subsystems

Reactive procedures

Virtual worldPhysical simulations

Several simulated peopleGraded miniscenarios

Multiple realms (physical,social, psychological, etc.)

ApplicationsPersonal teaching machine

Office assistantsStory telling and entertainment

Intelligent search engines

Figure 8. Dependencies Between Points of Consensus.

Computer Science. Hisresearch is focused onfinding ways to givecomputers humanlikecommon sense, and he ispresently collaboratingwith Marvin Minsky todevelop an architecturefor commonsense think-

ing that makes use of many types of mech-anisms for reasoning, representation, andreflection. He started the Open Mind Com-mon Sense project at MIT, an effort to buildlarge-scale commonsense knowledge basesby turning to the general public, and hasworked on incorporating commonsensereasoning into a variety of real-world appli-cations. Singh received his B.S. and M.Eng.in electrical engineering and computer sci-ence from MIT.

Aaron Sloman is a pro-fessor of AI and cogni-tive science at the Uni-versity of Birmingham,UK.He received his B.Sc.in mathematics andphysics (Cape Town,1956), and a D.Phil. Phi-losophy, from Oxford

(1962). Sloman is a Rhodes Scholar, a Fel-low of AAAI, AISB, and ECCAI. He is alsoauthor of The Computer Revolution in Philos-ophy (1978) and many theoretical paperson vision, diagrammatic reasoning, formsof representation, architectures, emotions,consciousness, philosophy of AI, and toolsfor exploring architectures. Sloman main-tains the FreePoplog open source web siteand is about to embark on a large EC-fund-ed robotics project. All papers, presenta-tions, and software are accessible from hishome page: www.cs.bham.ac.uk/~axs/

Agent Systems. Cambridge, Mass., Septem-ber 30 – October 3.

Sloman, Aaron 2001. Beyond ShallowModels of Emotion. Cognitive Processing,1(1):530-539.

Note1. This meeting was held in St. Thomas,U.S. Virgin Islands, on April 14-16, 2002.The meeting included the following partic-ipants: Larry Birnbaum (Northwestern Uni-versity), Ken Forbus (Northwestern Univer-sity), Ben Kuipers (University of Texas atAustin), Douglas Lenat (Cycorp), HenryLieberman (Massachusetts Institute ofTechnology), Henry Minsky (Laszlo Sys-tems), Marvin Minsky (Massachusetts Insti-tute of Technology), Erik Mueller (IBM T. J.Watson Research Center), Srini Narayanan(University of California, Berkeley), AshwinRam (Georgia Institute of Technology),Doug Riecken (IBM T. J. Watson ResearchCenter), Roger Schank (Carnegie MellonUniversity), Mary Shepard (Cycorp), PushSingh (Massachusetts Institute of Technol-ogy), Jeffrey Mark Siskind (Purdue Univer-sity), Aaron Sloman (University of Birming-ham), Oliver Steele (Laszlo Systems), LindaStone (independent consultant), VernorVinge (San Diego State University), andMichael Witbrock (Cycorp).

Marvin Minsky hasmade many contribu-tions to AI, cognitivepsychology, mathemat-ics, computational lin-guistics, robotics, andoptics. In recent years hehas worked chiefly onimparting to machines

the human capacity for commonsense rea-soning. His conception of human intellec-tual structure and function is presented inThe Society of Mind which is also the title ofthe course he teaches at MIT. He receivedhis B.A. and Ph.D. in mathematics at Har-vard and Princeton. In 1951 he built theSNARC, the first neural network simulator.His other inventions include mechanicalhands and other robotic devices, the confo-cal scanning microscope, the “Muse” syn-thesizer for musical variations (with E.Fredkin), and the first LOGO “turtle” (withS. Papert). A member of the NAS, NAE andArgentine NAS, he has received the ACMTuring Award, the MIT Killian Award, theJapan Prize, the IJCAI Research ExcellenceAward, the Rank Prize and the RobertWood Prize for Optoelectronics, and theBenjamin Franklin Medal.

Push Singh is a doctoral candidate in MIT’sDepartment of Electrical Engineering and

ments of knowledge, reasoning, andlearning. While we are gratified thatarchitectural approaches are becomingmore popular, we would like to seemore emphasis placed on architecturaldesigns that specifically support morecommon sense styles of thinking.

There was a genuine sense of excite-ment at this meeting. The participantsfelt that it was a rare opportunity to fo-cus once more on the grand goal ofbuilding a human-level intelligence.Over the next few years, we plan to de-velop a concrete implementation ofan architecture based on the ideas dis-cussed at this meeting, and we invitethe rest of the AI community to join usin such efforts.

AcknowledgementsWe would like to thank Cecile De-jongh for taking care of the localarrangements, and extend a very spe-cial thanks to Linda Stone for makingthis meeting happen. This meetingwas made possible by the generoussupport of Jeffrey Epstein.

ReferencesBrachman, Ronald; and Lemnios, Zachary2002. DARPA’s New Cognitive Systems Vi-sion. Computing Research News, 14(5):1, 8.

Kitano, Hiroaki; Asada, Minoru; Kuniyoshi,Yasuo; Noda, Itsuki; Osawa, Eiichi; andMatsubara, Hitoshi. 1997. RoboCup: AChallenge problem for AI. AI Magazine,18(1):73–85.

Laird, John; Newell, Allen; and Rosen-bloom, Paul 1987. SOAR: An Architecturefor General Intelligence. AI Journal, 33(1):1-64.

Lenat, Doug. 1995. CYC: A Large-scale In-vestment in Knowledge Infrastructure.Communications of the ACM, 38(11):33-38.

McCarthy, John; Minsky, Marvin; Sloman,Aaron; Gong, Leiguang; Lau, Tessa; Mor-genstern, Leora; Mueller, Erik; Riecken,Doug; Singh, Moninder; and Singh, Push2002. An Architecture of Diversity for Com-monsense Reasoning. IBM Systems Journal,41(3):530–539.

Minsky, Marvin. (forthcoming). The Emo-tion Machine. Pantheon, New York. Severalchapters are on-line athttp://web.media.mit.edu/people/minsky

Minsky, Marvin 1992. Future of AI Technol-ogy. Toshiba Review, 47(7).

Singh, Push ; and Minsky, Marvin. 2003.An Architecture for Combining Ways toThink. Paper presented at the InternationalConference on Knowledge Intensive Multi-

Articles

124 AI MAGAZINE

articles the st. thomas common sense symposiumweb.mit.edu/aisha/public/common_sense.pdf ·...

Documents