soar one-hour tutorial john e. laird university of michigan march 2009 [email protected] supported in...
TRANSCRIPT
Soar One-hour Tutorial
John E. Laird
University of MichiganMarch 2009
http://sitemaker.umich.edu/soar [email protected]
Supported in part by DARPA and ONR
1
Tutorial Outline1. Cognitive Architecture2. Soar History3. Overview of Soar4. Details of Basic Soar Processing and Syntax
– Internal decision cycle– Interaction with external environments– Subgoals and meta-reasoning– Chunking
5. Recent extensions to Soar– Reinforcement Learning– Semantic Memory– Episodic Memory– Visual Imagery
2
Learning
How can we build a human-level AI?
3
Tasks
Neurons
Neural Circuits
Brain Structure
CalculusHistory
ReadingSudoku
Shopping
Driving
Talking on cell phone
Learning
How can we build a human-level AI?
Tasks
Neurons
Neural Circuits
Brain Structure
CalculusHistory
ReadingSudoku
Shopping
Driving
Talking on cell phone
4
Programs
Computer Architecture
Logic Circuits
Electrical circuits
Learning
How can we build a human-level AI?
Tasks
Neurons
Neural Circuits
Brain Structure
CalculusHistory
ReadingSudoku
Shopping
Driving
Talking on cell phone
5
Programs
Computer Architecture
Logic Circuits
Electrical circuits
Symbolic Long-Term Memories
Procedural
Symbolic Short-Term Memory
Decision Procedure
ChunkingReinforcementLearning
Semantic
SemanticLearning
Episodic
EpisodicLearning
Perception Action
Imagery
App
rais
als
CognitiveArchitecture
Body
Cognitive Architecture
Fixed mechanisms underlying cognition– Memories, processing elements, control, interfaces– Representations of knowledge– Separation of fixed processes and variable knowledge– Complex behavior arises from composition of simple
primitives
Purpose: – Bring knowledge to bear to select actions to achieve
goals
Not just a framework – BDI, NN, logic & probability, rule-based systems
Important constraints:– Continual performance– Real-time performance– Incremental, on-line learning
Architecture
Knowledge Goals
Task Environment
6
Common Structures of manyCognitive Architectures
7
Short-term Memory
Procedural Long-term Memory
Declarative Long-term Memory
Perception Action
ActionSelection
Procedure Learning
Declarative Learning
Goals
Different Goals of Cognitive Architecture
• Biological plausibility: Does the architecture correspond to what we know about the brain?
• Psychological plausibility: Does the architecture capture the details of human performance in a wide range of cognitive tasks?
• Functionality: Does the architecture explain how humans achieve their high level of intellectual function? – Building Human-level AI
8
Short History of Soar
9
1980 19951985 1990 2000 2005
Pre-SoarProblem SpacesProduction SystemsHeuristic Search
Functionality
Modeling
Multi-method Multi-task problem solvingSubgoalingChunking
UTCNatural LanguageHCIExternal Environment
IntegrationLarge bodies of knowledgeTeamworkReal Application
Virtual AgentsLearning from Experience, Observation, Instruction
New Capabilities
Distinctive Features of Soar
• Emphasis on functionality– Take engineering, scaling issues seriously
– Interfaces to real world systems
– Can build very large systems in Soar that exist for a long time
• Integration with perception and action– Mental imagery and spatial reasoning
• Integrates reaction, deliberation, meta-reasoning– Dynamically switching between them
• Integrated learning – Chunking, reinforcement learning, episodic & semantic
• Useful in cognitive modeling– Expanding this is emphasis of many current projects
• Easy to integrate with other systems & environments– SML efficiently supports many languages, inter-process
10
System ArchitectureSoar Kernel
gSKI
KernelSML
ClientSML
SWIG LanguageLayer
Application
SML
Soar 9.0 Kernel (C)
Higher-level Interface (C++)
Encodes/Decodes function calls and responses in XML (C++)
Soar Markup Language
Encodes/Decodes function calls and responses in XML (C++)
Wrapper for Java/Tcl (Not needed if app is in C++)
Application (any language)
Soar Basics
• Operators: Deliberate changes to internal/external state • Activity is a series of operators controlled by knowledge:
1. Input from environment
2. Elaborate current situation: parallel rules
3. Propose and evaluate operators via preferences: parallel rules
4. Select operator
5. Apply operator: Modify internal data structures: parallel rules
6. Output to motor system12
Agent in real or virtual world
?
Agent in new state
?
Agent in new state
Operator
Basic Soar Architecture
Body
Long-Term MemoryProcedural
Symbolic Short-Term MemoryDecision
Procedure
Chunking
Perception Action
ElaborateOperator OutputInput
Elaborate State
Propose Operators
Evaluate Operators
Select Operator Apply Operator
ApplyDecide
13
EvaluateOperatorsEvaluate
Operators
ProductionMemory
WorkingMemory
Soar 101: Eaters
East
South
North
ProposeOperator
North > EastSouth > East
North = South
ApplyOperator
OutputInputSelect
Operator
If cell in direction <d> is not a wall, --> propose operator move <d>
If operator <o1> will move to a bonus food and operator <o2> will move to a normal food, --> operator <o1> > <o2>
If an operator is selected to move <d>--> create outputmove-direction <d>
InputPropose
OperatorSelect
OperatorApply
OperatorOutput
If operator <o1> will move to a empty cell--> operator <o1> <
North > EastSouth <
move-direction North
Example Working Memory
B
A (s1 ^block b1 ^block b2 ^table t1)(b1 ^color blue ^name A ^ontop b2 ^size 1 ^type block ^weight 14)(b2 ^color yellow ^name B ^ontop t1 ^size 1 ^type block ^under b1 ^weight 14)(t1 ^color gray ^shape square ^type table ^under b2)
Working memory is a graph.All working memory elements must be “linked” directly or indirectly to a
state.
S1
b1
t1
b2
^block
^block
^table
yellow
block
1
B
14
^color
^name
^size
^type
^weight
^under
^ontop
15
Soar Processing Cycle
16
ElaborateOperator
OutputInput
Elaborate State
Propose Operators
Evaluate Operators
Select Operator Apply Operator
ApplyDecide
Rules Impasse
Subgoal
ElaborateOperator
OutputInput
Elaborate State
Propose Operators
Evaluate Operators
Select Operator Apply Operator
ApplyDecide
TankSoar
Red Tank’s Shield
Borders (stone)
Walls (trees)
Health charger
Missile pack
Blue tank (Ouch!)
Energy charger
Green tank’s radar
17
Soar 103: Subgoals
ProposeOperator
CompareOperators
ApplyOperator
OutputInputSelect
OperatorInput
ProposeOperator
CompareOperators
SelectOperator
Move
Wander
If enemy not sensed, then wander
Turn
ApplyOperator
Output
Soar 103: Subgoals
ProposeOperator
CompareOperators
ApplyOperator
OutputInput SelectOperator
Attack
If enemy is sensed, then attack
Shoot
TacAir-Soar [1997]
Controls simulated aircraft in real-time training exercises (>3000 entities)
Flies all U.S. air missions
Dynamically changes missions as appropriate
Communicates and coordinates with computer and human controlled planes
Large knowledge base (8000 rules)
No learning
TacAir-Soar Task Decomposition
AchieveProximity
EmployWeapons
SearchExecuteTactic
Scram
Get MissileLAR
SelectMissile
Get SteeringCircle
SortGroup
LaunchMissile
Lock Radar Lock IR Fire-MissileWait-for
Missile-Clear
If intercepting an enemy andthe enemy is within range ROE are met thenpropose employ-weapons
EmployWeapons
If employing-weapons andmissile has been selected andthe enemy is in the steering circle and LAR has been achieved, then propose launch-missile Launch
MissileIf launching a missile andit is an IR missile and there is currently no IR lockthen propose lock-IRLock IR
Execute Mission
Fly-route GroundAttack
Fly-Wing Intercept
If instructed to intercept an enemy then propose intercept
Intercept
>250 goals, >600 operators, >8000 rules 21
Impasse/Substate Implications:
• Substate is really meta-state that allows system to reflect• Substate = goal to resolve impasse
– Generate operator – Select operator (deliberate control)– Apply operator (task decomposition)
• All basic problem solving functions open to reflection – Operator creation, selection, application, state elaboration
• Substate is where knowledge to resolve impasse can be found• Hierarchy of substate/subgoals arise through recursive impasses
22
Tie Subgoals and Chunking
East
South
North
ProposeOperator
EvaluateOperators
ApplyOperator
OutputInput SelectOperator
InputPropose
OperatorEvaluate
OperatorsSelect
Operator
Tie Impasse
Evaluate-operator (North)
North = 10
Evaluate-operator (South)
Evaluate-operator (East)
= 10 = 10 = 5
Chunking creates rule that applies evaluate-operator
North > EastSouth > EastNorth = South
= 10
Chunking creates rules that create preferences
based on what was tested
Chunking Analysis
• Converts deliberate reasoning/planning to reaction• Generality of learning based on generality of reasoning
– Leads to many different types learning– If reasoning is inductive, so is learning
• Soar only learns what it thinks about• Chunking is impasse driven
– Learning arises from a lack of knowledge
24
Extending Soar
• Learn from internal rewards– Reinforcement learning
• Learn facts– What you know– Semantic memory
• Learn events– What you remember– Episodic memory
• Basic drives and …– Emotions, feelings, mood
• Non-symbolic reasoning– Mental imagery
• Learn from regularities– Spatial and temporal clusters
Body
Symbolic Long-Term Memories
Procedural
Symbolic Short-Term MemoryDecision
Procedure
ChunkingReinforcementLearning
Semantic
SemanticLearning
Episodic
EpisodicLearning
Perception ActionVisual
Imagery
App
rais
al
Det
ecto
r
ReinforcementLearning
Clustering
25
Theoretical Commitments
Stayed the Same• Problem Space Computational Model• Long-term & short-term memories• Associative procedural knowledge• Fixed decision procedure• Impasse-driven reasoning• Incremental, experience-driven
learning• No task-specific modules
Changed• Multiple long-term memories• Multiple learning mechanisms• Modality-specific representations &
processing• Non-symbolic processing
– Symbol generation (clustering)– Control (numeric preferences)– Learning Control (reinforcement learning)– Intrinsic reward (appraisals)– Aid memory retrieval (WM activation)– Non-symbolic reasoning (visual imagery)
26
Reinforcement LearningShelly Nason
27
RL in Soar
1. Encode the value function as operator evaluation rules with numeric preferences.
2. Combine all numeric preferences for an operator dynamically.
3. Adjust value of numeric preferences with experience.
Internal State
Value Function
PerceptionReward
Update ValueFunction
Action Selection Action
28
The Q-function in Soar
The value-function is stored in rules that test the state and operator, and create
numeric preferences.
sp {rl-rule (state <s> ^operator <o> +) …--> (<s> ^operator <o> = 0.34)}
Operator Q-value = the sum of all numeric preferences.Selection: epsilon greedy, or Boltzmann
O1: {.34, .45, .02} = 8.1
O2: {.25, .11, .12} = 4.8
O3: {-.04, .14, -.05} = .05
epsilon-greedy: With probability ε the agent selects an action at random. Otherwise the agent takes the action with the highest expected value. [Balance exploration/exploitation]
29
Updating operator values
Sarsa update:Q(s,O1) Q(s,O1) + α[r + λQ(s’,O2) – Q(s,O1)]
.1 * [.2 + .9*.11 - .33] = -.03
Update is split evenly between rules contributing to O1 = -.01.
R1 = .19, R2 = .14, R3 = -.03
O1 = .33
Q(s,O1) = sum of numeric prefs.
r = reward = .2
O2 = .11
Q(s’,O2) = sum of numeric prefs. of selected operator (O2)
R1(O1) = .20R2(O1) = .15R3(O1)= -.02
30
Results with Eaters
0
200
400
600
800
1000
1200
1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289
To
tal
Sc
ore
Move #
Figure 2a rule
Random
After 5
After 10
After 15
After 20
31
RL TankSoar Agent
-20
-10
0
10
20
30
40
50
60
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171
Successive Games
Ave
rag
e M
arg
in o
f V
icto
ry
32
Semantic MemoryYongjia Wang
33
Memory Systems
Memory
Long Term Memory Short Term Memory
Declarative Procedural
Semantic Memory
Episodic Memory
Perceptual Representation
System
Procedural Memory
Working Memory
34
Declarative Memory Alternatives
• Working Memory– Keep everything in working memory
• Retrieve dynamically with rules– Rules provide asymmetric access – Data chunking to learn (complex)
• Separate Declarative Memories– Semantic memory (facts)– Episodic memory (events)
35
Basic Semantic Memory Functionalities
• Encoding– What to save?– When to add new declarative chunk?– How to update knowledge?
• Retrieval– How the cue is placed and matched?– What are the different types of retrieval?
• Storage– What are the storage structures? – How are they maintained?
36
Semantic Memory Functionalities
AB A
state
B
Cue
AExpand
NIL NIL
ExpandCue
C
D E F
D EFE
E
Save
NILSave
Save
Feature Match
Retrieval
Update with Complex Structure
AutoCommit
Remove-No-Change
Semantic Memory
Working Memory
37
Episodic Memory Andrew Nuxoll
38
Memory Systems
Memory
Long Term Memory Short Term Memory
Declarative Procedural
Semantic Memory
Episodic Memory
Perceptual Representation
System
Procedural Memory
Working Memory
39
Episodic vs. Semantic Memory
• Semantic Memory–Knowledge of what we “know”–Example: what state the Grand Canyon
is in• Episodic Memory
–History of specific events–Example: a family vacation to the Grand Canyon
Characteristics of Episodic Memory: Tulving• Architectural:
– Does not compete with reasoning.
– Task independent
• Automatic: – Memories created without deliberate decision.
• Autonoetic: – Retrieved memory is distinguished from sensing.
• Autobiographical: – Episode remembered from own perspective.
• Variable Duration: – The time period spanned by a memory is not fixed.
• Temporally Indexed: – Rememberer has a sense of when the episode occurred.
41
Long-term Procedural MemoryProduction Rules
Implementation
Encoding Initiation?
Storage
Retrieval
When the agent takes an action.
Input
Output Cue
Retrieved
Working Memory
42
Long-term Procedural MemoryProduction Rules
Current Implementation
Encoding Initiation Content?Storage
Retrieval
The entire working memory is stored in the episode
Input
Output Cue
Retrieved
Working Memory
43
Long-term Procedural MemoryProduction Rules
Current Implementation
Encoding Initiation ContentStorage Episode Structure?Retrieval
Episodes are stored in a separate memory
Input
Output Cue
Retrieved
Working Memory
EpisodicMemory
EpisodicLearning
44
Long-term Procedural MemoryProduction Rules
Current Implementation
Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue?
Cue is placed in an architecture specific buffer.
Input
Output Cue
Retrieved
Working Memory
EpisodicMemory
EpisodicLearning
45
EpisodicMemory
Long-term Procedural MemoryProduction Rules
Current Implementation
Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue Retrieval
The closest partial match is retrieved.
Input
Output Cue
Retrieved
Working Memory
EpisodicLearning
46
Cognitive Capability: Virtual Sensing• Retrieve prior perception that
is relevant to the current task • Tank recursively searches
memory– Have I seen a charger from here?– Have I seen a place where I can
see a charger? ?
47
Virtual Sensors Results
0
50
100
150
200
250
1 3 5 7 9 11 13 15 17 19
Subsequent Searches
Av
era
ge
Nu
mb
er
of
Mo
ve
s
Average Random
Episodic Memory
48
Create a memory cue
East
South
North
Evaluate moving in each available direction
Cognitive Capability: Action Modeling
49
EpisodicRetrieval
Retrieve the best matching memory
RetrieveNext Memory
Retrieve the next memory Use the change in score to evaluate the proposed action
Move North = 10 points
Agent’s knowledge is insufficient - impasseAgent attempts to choose direction
Episodic Memory:Multi-Step Action Projection
[Andrew Nuxoll]
• Learn tactics from prior success and failure– Fight/flight– Back away from enemy (and fire)– Dodging
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
-30
-20
-10
0
10
20
30
40
Average Margin of Victory
Successive Games
Mar
gin
of
Vic
tory
Enables Cognitive Capabilities • Sensing
– Detect Changes – Detect Repetition– Virtual Sensing
• Reasoning– Model Actions– Use Previous
Successes/Failures– Model the Environment– Manage Long Term Goals– Explain Behavior
• Learning– Retroactive Learning– Allows Reanalysis Given New
Knowledge– “Boost” other Learning
Mechanisms
Episodic Memory
51
Mental Imagery and Spatial ReasoningScott Lathrop
Sam Wintermute
See AGI Talks
52
• Shape, color, topology, spatial properties
• Depictive, pixel-based representations
• Image algebra algorithms
Sentential/Algebraic algorithms
Depictive/Ordinal algorithms
VISUAL IMAGERY
VISUAL-SPATIAL VISUAL-DEPICTIVE
• Location, orientation
• Sentential, quantitative representations
• Linear algebra and computational geometry algorithms
WHAT IS VISUAL IMAGERY?
53
Where can you put A next to I?
54
Spatial Problem Solving with Mental Imagery[Scott Lathrop & Sam Wintermute]
Environment
Spatial Scene
Soar
Qualitative descriptions of object relationships
Qualitative description of new objects in relation to existing objects
Quantitative descriptions of environmental objects
O
A
A’ A’
(on AI)
(imagine_left_of A I)
(intersect A′ O)(no_intersect A’)
(imagine_right_of A I)(move_right_of A I)
I
Upcoming Challenges
• Continued refinement and integration• Integrate with complex perception and motor
systems• Adding/learning lots of world knowledge
+ Language, Spatial, Temporal Reasoning, …
• Scaling up to large bodies of knowledge – Build up from instruction, experience, exploration, …
56
Soar Community
• Soar Website– http://sitemaker.umich.edu/soar
• Soar Workshop every June in Ann Arbor– June 22-26, 2009
• Soar-group– http://lists.sourceforge.net/lists/listinfo/soar-group– Low traffic
57
Thanks to
Funding Agencies:
NSF, DARPA, ONR
Ph.D. students:
Nate Derbinsky, Nicholas Gorski, Scott Lathrop, Robert Marinier, Andrew Nuxoll, Yongjia Wang, Samuel Wintermute, Joseph Xu
Research Programmers:
Karen Coulter, Jonathan Voigt
Continued inspiration:
Allen Newell
58
Challenges in Cognitive Architecture Research
• Dynamic taskability– Pursue novel tasks
• Learning– Always learning, learning in unexpected and unplanned ways (wild learning)– Transition from programming to learning by imitation, instruction, experience,
reflection, …• Natural language
– Active area but much left to do.• Social behavior
– Interaction with humans and other entities • Connect to the real world
– Cognitive robotics with long-term existence• Applications
– Expand domains and problems– Putting cognitive architectures to work
• Connect to unfolding research on the brain, psychology, and the rest of AI.60