soar one-hour tutorial john e. laird university of michigan march 2009 [email protected] supported in...

Soar One-hour Tutorial

John E. Laird

University of MichiganMarch 2009

http://sitemaker.umich.edu/soar [email protected]

Supported in part by DARPA and ONR

1

Tutorial Outline1. Cognitive Architecture2. Soar History3. Overview of Soar4. Details of Basic Soar Processing and Syntax

– Internal decision cycle– Interaction with external environments– Subgoals and meta-reasoning– Chunking

5. Recent extensions to Soar– Reinforcement Learning– Semantic Memory– Episodic Memory– Visual Imagery

2

Learning

How can we build a human-level AI?

3

Tasks

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving

Talking on cell phone

Learning


Tasks

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving


4

Programs

Computer Architecture

Logic Circuits

Electrical circuits

Learning


Tasks

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving


5

Programs

Computer Architecture

Logic Circuits

Electrical circuits

Symbolic Long-Term Memories

Procedural

Symbolic Short-Term Memory

Decision Procedure

ChunkingReinforcementLearning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception Action

Imagery

App

rais

als

CognitiveArchitecture

Body

Cognitive Architecture

Fixed mechanisms underlying cognition– Memories, processing elements, control, interfaces– Representations of knowledge– Separation of fixed processes and variable knowledge– Complex behavior arises from composition of simple

primitives

Purpose: – Bring knowledge to bear to select actions to achieve

goals

Not just a framework – BDI, NN, logic & probability, rule-based systems

Important constraints:– Continual performance– Real-time performance– Incremental, on-line learning

Architecture

Knowledge Goals

Task Environment

6

Common Structures of manyCognitive Architectures

7

Short-term Memory

Procedural Long-term Memory

Declarative Long-term Memory

Perception Action

ActionSelection

Procedure Learning

Declarative Learning

Goals

Different Goals of Cognitive Architecture

• Biological plausibility: Does the architecture correspond to what we know about the brain?

• Psychological plausibility: Does the architecture capture the details of human performance in a wide range of cognitive tasks?

• Functionality: Does the architecture explain how humans achieve their high level of intellectual function? – Building Human-level AI

8

Short History of Soar

9

1980 19951985 1990 2000 2005

Pre-SoarProblem SpacesProduction SystemsHeuristic Search

Functionality

Modeling

Multi-method Multi-task problem solvingSubgoalingChunking

UTCNatural LanguageHCIExternal Environment

IntegrationLarge bodies of knowledgeTeamworkReal Application

Virtual AgentsLearning from Experience, Observation, Instruction

New Capabilities

Distinctive Features of Soar

• Emphasis on functionality– Take engineering, scaling issues seriously

– Interfaces to real world systems

– Can build very large systems in Soar that exist for a long time

• Integration with perception and action– Mental imagery and spatial reasoning

• Integrates reaction, deliberation, meta-reasoning– Dynamically switching between them

• Integrated learning – Chunking, reinforcement learning, episodic & semantic

• Useful in cognitive modeling– Expanding this is emphasis of many current projects

• Easy to integrate with other systems & environments– SML efficiently supports many languages, inter-process

10

System ArchitectureSoar Kernel

gSKI

KernelSML

ClientSML

SWIG LanguageLayer

Application

SML

Soar 9.0 Kernel (C)

Higher-level Interface (C++)

Encodes/Decodes function calls and responses in XML (C++)

Soar Markup Language

Encodes/Decodes function calls and responses in XML (C++)

Wrapper for Java/Tcl (Not needed if app is in C++)

Application (any language)

Soar Basics

• Operators: Deliberate changes to internal/external state • Activity is a series of operators controlled by knowledge:

1. Input from environment

2. Elaborate current situation: parallel rules

3. Propose and evaluate operators via preferences: parallel rules

4. Select operator

5. Apply operator: Modify internal data structures: parallel rules

6. Output to motor system12

Agent in real or virtual world

?

Agent in new state

?

Agent in new state

Operator

Basic Soar Architecture

Body

Long-Term MemoryProcedural

Symbolic Short-Term MemoryDecision

Procedure

Chunking

Perception Action

ElaborateOperator OutputInput

Elaborate State

Propose Operators

Evaluate Operators

Select Operator Apply Operator

ApplyDecide

13

EvaluateOperatorsEvaluate

Operators

ProductionMemory

WorkingMemory

Soar 101: Eaters

East

South

North

ProposeOperator

North > EastSouth > East

North = South

ApplyOperator

OutputInputSelect

Operator

If cell in direction <d> is not a wall, --> propose operator move <d>

If operator <o1> will move to a bonus food and operator <o2> will move to a normal food, --> operator <o1> > <o2>

If an operator is selected to move <d>--> create outputmove-direction <d>

InputPropose

OperatorSelect

OperatorApply

OperatorOutput

If operator <o1> will move to a empty cell--> operator <o1> <

North > EastSouth <

move-direction North

Example Working Memory

B

A (s1 ^block b1 ^block b2 ^table t1)(b1 ^color blue ^name A ôntop b2 ^size 1 ^type block ^weight 14)(b2 ^color yellow ^name B ôntop t1 ^size 1 ^type block ûnder b1 ^weight 14)(t1 ^color gray ^shape square ^type table ûnder b2)

Working memory is a graph.All working memory elements must be “linked” directly or indirectly to a

state.

S1

b1

t1

b2

^block

^block

^table

yellow

block

1

B

14

^color

^name

^size

^type

^weight

ûnder

ôntop

15

Soar Processing Cycle

16

ElaborateOperator

OutputInput

Elaborate State

Propose Operators

Evaluate Operators


ApplyDecide

Rules Impasse

Subgoal

ElaborateOperator

OutputInput

Elaborate State

Propose Operators

Evaluate Operators


ApplyDecide

TankSoar

Red Tank’s Shield

Borders (stone)

Walls (trees)

Health charger

Missile pack

Blue tank (Ouch!)

Energy charger

Green tank’s radar

17

Soar 103: Subgoals

ProposeOperator

CompareOperators

ApplyOperator

OutputInputSelect

OperatorInput

ProposeOperator

CompareOperators

SelectOperator

Move

Wander

If enemy not sensed, then wander

Turn

ApplyOperator

Output

Soar 103: Subgoals

ProposeOperator

CompareOperators

ApplyOperator

OutputInput SelectOperator

Attack

If enemy is sensed, then attack

Shoot

TacAir-Soar [1997]

Controls simulated aircraft in real-time training exercises (>3000 entities)

Flies all U.S. air missions

Dynamically changes missions as appropriate

Communicates and coordinates with computer and human controlled planes

Large knowledge base (8000 rules)

No learning

TacAir-Soar Task Decomposition

AchieveProximity

EmployWeapons

SearchExecuteTactic

Scram

Get MissileLAR

SelectMissile

Get SteeringCircle

SortGroup

LaunchMissile

Lock Radar Lock IR Fire-MissileWait-for

Missile-Clear

If intercepting an enemy andthe enemy is within range ROE are met thenpropose employ-weapons

EmployWeapons

If employing-weapons andmissile has been selected andthe enemy is in the steering circle and LAR has been achieved, then propose launch-missile Launch

MissileIf launching a missile andit is an IR missile and there is currently no IR lockthen propose lock-IRLock IR

Execute Mission

Fly-route GroundAttack

Fly-Wing Intercept

If instructed to intercept an enemy then propose intercept

Intercept

>250 goals, >600 operators, >8000 rules 21

Impasse/Substate Implications:

• Substate is really meta-state that allows system to reflect• Substate = goal to resolve impasse

– Generate operator – Select operator (deliberate control)– Apply operator (task decomposition)

• All basic problem solving functions open to reflection – Operator creation, selection, application, state elaboration

• Substate is where knowledge to resolve impasse can be found• Hierarchy of substate/subgoals arise through recursive impasses

22

Tie Subgoals and Chunking

East

South

North

ProposeOperator

EvaluateOperators

ApplyOperator

OutputInput SelectOperator

InputPropose

OperatorEvaluate

OperatorsSelect

Operator

Tie Impasse

Evaluate-operator (North)

North = 10

Evaluate-operator (South)

Evaluate-operator (East)

= 10 = 10 = 5

Chunking creates rule that applies evaluate-operator

North > EastSouth > EastNorth = South

= 10

Chunking creates rules that create preferences

based on what was tested

Chunking Analysis

• Converts deliberate reasoning/planning to reaction• Generality of learning based on generality of reasoning

– Leads to many different types learning– If reasoning is inductive, so is learning

• Soar only learns what it thinks about• Chunking is impasse driven

– Learning arises from a lack of knowledge

24

Extending Soar

• Learn from internal rewards– Reinforcement learning

• Learn facts– What you know– Semantic memory

• Learn events– What you remember– Episodic memory

• Basic drives and …– Emotions, feelings, mood

• Non-symbolic reasoning– Mental imagery

• Learn from regularities– Spatial and temporal clusters

Body

Symbolic Long-Term Memories

Procedural

Symbolic Short-Term MemoryDecision

Procedure

ChunkingReinforcementLearning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception ActionVisual

Imagery

App

rais

al

Det

ecto

r

ReinforcementLearning

Clustering

25

Theoretical Commitments

Stayed the Same• Problem Space Computational Model• Long-term & short-term memories• Associative procedural knowledge• Fixed decision procedure• Impasse-driven reasoning• Incremental, experience-driven

learning• No task-specific modules

Changed• Multiple long-term memories• Multiple learning mechanisms• Modality-specific representations &

processing• Non-symbolic processing

– Symbol generation (clustering)– Control (numeric preferences)– Learning Control (reinforcement learning)– Intrinsic reward (appraisals)– Aid memory retrieval (WM activation)– Non-symbolic reasoning (visual imagery)

26

Reinforcement LearningShelly Nason

27

RL in Soar

1. Encode the value function as operator evaluation rules with numeric preferences.

2. Combine all numeric preferences for an operator dynamically.

3. Adjust value of numeric preferences with experience.

Internal State

Value Function

PerceptionReward

Update ValueFunction

Action Selection Action

28

The Q-function in Soar

The value-function is stored in rules that test the state and operator, and create

numeric preferences.

sp {rl-rule (state <s> ^operator <o> +) …--> (<s> ^operator <o> = 0.34)}

Operator Q-value = the sum of all numeric preferences.Selection: epsilon greedy, or Boltzmann

O1: {.34, .45, .02} = 8.1

O2: {.25, .11, .12} = 4.8

O3: {-.04, .14, -.05} = .05

epsilon-greedy: With probability ε the agent selects an action at random. Otherwise the agent takes the action with the highest expected value. [Balance exploration/exploitation]

29

Updating operator values

Sarsa update:Q(s,O1) Q(s,O1) + α[r + λQ(s’,O2) – Q(s,O1)]

.1 * [.2 + .9*.11 - .33] = -.03

Update is split evenly between rules contributing to O1 = -.01.

R1 = .19, R2 = .14, R3 = -.03

O1 = .33

Q(s,O1) = sum of numeric prefs.

r = reward = .2

O2 = .11

Q(s’,O2) = sum of numeric prefs. of selected operator (O2)

R1(O1) = .20R2(O1) = .15R3(O1)= -.02

30

Results with Eaters

0

200

400

600

800

1000

1200

1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289

To

tal

Sc

ore

Move #

Figure 2a rule

Random

After 5

After 10

After 15

After 20

31

RL TankSoar Agent

-20

-10

0

10

20

30

40

50

60

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171

Successive Games

Ave

rag

e M

arg

in o

f V

icto

ry

32

Semantic MemoryYongjia Wang

33

Memory Systems

Memory

Long Term Memory Short Term Memory

Declarative Procedural

Semantic Memory

Episodic Memory

Perceptual Representation

System

Procedural Memory

Working Memory

34

Declarative Memory Alternatives

• Working Memory– Keep everything in working memory

• Retrieve dynamically with rules– Rules provide asymmetric access – Data chunking to learn (complex)

• Separate Declarative Memories– Semantic memory (facts)– Episodic memory (events)

35

Basic Semantic Memory Functionalities

• Encoding– What to save?– When to add new declarative chunk?– How to update knowledge?

• Retrieval– How the cue is placed and matched?– What are the different types of retrieval?

• Storage– What are the storage structures? – How are they maintained?

36

Semantic Memory Functionalities

AB A

state

B

Cue

AExpand

NIL NIL

ExpandCue

C

D E F

D EFE

E

Save

NILSave

Save

Feature Match

Retrieval

Update with Complex Structure

AutoCommit

Remove-No-Change

Semantic Memory

Working Memory

37

Episodic Memory Andrew Nuxoll

38

Memory Systems

Memory

Long Term Memory Short Term Memory

Declarative Procedural

Semantic Memory

Episodic Memory

Perceptual Representation

System

Procedural Memory

Working Memory

39

Episodic vs. Semantic Memory

• Semantic Memory–Knowledge of what we “know”–Example: what state the Grand Canyon

is in• Episodic Memory

–History of specific events–Example: a family vacation to the Grand Canyon

Characteristics of Episodic Memory: Tulving• Architectural:

– Does not compete with reasoning.

– Task independent

• Automatic: – Memories created without deliberate decision.

• Autonoetic: – Retrieved memory is distinguished from sensing.

• Autobiographical: – Episode remembered from own perspective.

• Variable Duration: – The time period spanned by a memory is not fixed.

• Temporally Indexed: – Rememberer has a sense of when the episode occurred.

41

Long-term Procedural MemoryProduction Rules

Implementation

Encoding Initiation?

Storage

Retrieval

When the agent takes an action.

Input

Output Cue

Retrieved

Working Memory

42


Current Implementation

Encoding Initiation Content?Storage

Retrieval

The entire working memory is stored in the episode

Input

Output Cue

Retrieved

Working Memory

43



Encoding Initiation ContentStorage Episode Structure?Retrieval

Episodes are stored in a separate memory

Input

Output Cue

Retrieved

Working Memory

EpisodicMemory

EpisodicLearning

44



Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue?

Cue is placed in an architecture specific buffer.

Input

Output Cue

Retrieved

Working Memory

EpisodicMemory

EpisodicLearning

45

EpisodicMemory



Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue Retrieval

The closest partial match is retrieved.

Input

Output Cue

Retrieved

Working Memory

EpisodicLearning

46

Cognitive Capability: Virtual Sensing• Retrieve prior perception that

is relevant to the current task • Tank recursively searches

memory– Have I seen a charger from here?– Have I seen a place where I can

see a charger? ?

47

Virtual Sensors Results

0

50

100

150

200

250

1 3 5 7 9 11 13 15 17 19

Subsequent Searches

Av

era

ge

Nu

mb

er

of

Mo

ve

s

Average Random

Episodic Memory

48

Create a memory cue

East

South

North

Evaluate moving in each available direction

Cognitive Capability: Action Modeling

49

EpisodicRetrieval

Retrieve the best matching memory

RetrieveNext Memory

Retrieve the next memory Use the change in score to evaluate the proposed action

Move North = 10 points

Agent’s knowledge is insufficient - impasseAgent attempts to choose direction

Episodic Memory:Multi-Step Action Projection

[Andrew Nuxoll]

• Learn tactics from prior success and failure– Fight/flight– Back away from enemy (and fire)– Dodging

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174

-30

-20

-10

0

10

20

30

40

Average Margin of Victory

Successive Games

Mar

gin

of

Vic

tory

Enables Cognitive Capabilities • Sensing

– Detect Changes – Detect Repetition– Virtual Sensing

• Reasoning– Model Actions– Use Previous

Successes/Failures– Model the Environment– Manage Long Term Goals– Explain Behavior

• Learning– Retroactive Learning– Allows Reanalysis Given New

Knowledge– “Boost” other Learning

Mechanisms

Episodic Memory

51

Mental Imagery and Spatial ReasoningScott Lathrop

Sam Wintermute

See AGI Talks

52

• Shape, color, topology, spatial properties

• Depictive, pixel-based representations

• Image algebra algorithms

Sentential/Algebraic algorithms

Depictive/Ordinal algorithms

VISUAL IMAGERY

VISUAL-SPATIAL VISUAL-DEPICTIVE

• Location, orientation

• Sentential, quantitative representations

• Linear algebra and computational geometry algorithms

WHAT IS VISUAL IMAGERY?

53

Where can you put A next to I?

54

Spatial Problem Solving with Mental Imagery[Scott Lathrop & Sam Wintermute]

Environment

Spatial Scene

Soar

Qualitative descriptions of object relationships

Qualitative description of new objects in relation to existing objects

Quantitative descriptions of environmental objects

O

A

A’ A’

(on AI)

(imagine_left_of A I)

(intersect A′ O)(no_intersect A’)

(imagine_right_of A I)(move_right_of A I)

I

Upcoming Challenges

• Continued refinement and integration• Integrate with complex perception and motor

systems• Adding/learning lots of world knowledge

+ Language, Spatial, Temporal Reasoning, …

• Scaling up to large bodies of knowledge – Build up from instruction, experience, exploration, …

56

Soar Community

• Soar Website– http://sitemaker.umich.edu/soar

• Soar Workshop every June in Ann Arbor– June 22-26, 2009

• Soar-group– http://lists.sourceforge.net/lists/listinfo/soar-group– Low traffic

57

Thanks to

Funding Agencies:

NSF, DARPA, ONR

Ph.D. students:

Nate Derbinsky, Nicholas Gorski, Scott Lathrop, Robert Marinier, Andrew Nuxoll, Yongjia Wang, Samuel Wintermute, Joseph Xu

Research Programmers:

Karen Coulter, Jonathan Voigt

Continued inspiration:

Allen Newell

58

Challenges in Cognitive Architecture Research

• Dynamic taskability– Pursue novel tasks

• Learning– Always learning, learning in unexpected and unplanned ways (wild learning)– Transition from programming to learning by imitation, instruction, experience,

reflection, …• Natural language

– Active area but much left to do.• Social behavior

– Interaction with humans and other entities • Connect to the real world

– Cognitive robotics with long-term existence• Applications

– Expand domains and problems– Putting cognitive architectures to work

• Connect to unfolding research on the brain, psychology, and the rest of AI.60

soar one-hour tutorial john e. laird university of michigan march 2009 [email protected] supported in...

Documents

soar history

language slide

operator outputinput

operatorapply operator

overview of soar

soar basics operators

cell phone slide

direction north slide