john laird – soar rules · pdf filesoar rules john e. laird john l. tishman professor of...
TRANSCRIPT
Soar Rules
John E. LairdJohn L. Tishman Professor of Engineering
University of MichiganResearch funded by TARDEC, ONR, AFOSR
The Role of Production Rules in a General Cognitive Architecture
Origins
“The production system was one of those happy events, though in minor key, that historians of science often talk about: a rather well-prepared formalism, sitting in wait for a scientific mission. Production systems have a long and diverse history. Their use of symbolic logic starts with Post (1943), from whom the name is taken. They also show up as Markov algorithms (1954). Their use in linguistics, where they are also called rewrite rules, dates from Chomsky (1957). As with so many other notions in computer science, they really entered into wide currency when they became operationalized in programming languages.”
(Newell and Simon, 1972, p. 889).2
Emil Post 1943
• Post production systems– P1: $$ -> * – P2: *$ -> * – P3: *x -> x* – P4: * -> null & halt – P5: $xy -> y$x– P6: null -> $
3
Newell & Simon (1965)• Proposed post-production systems as basis for programming languages• Outgrowth of research on human problem solving
– Logic, Chess, Crypta-arithmetic: SEND + MORE = MONEY• Strengths of rule-based systems
1. Computational generality (Turing Equivalent)2. Independent units/components of behavior – add/remove easily3. Models human-short-term memory and human procedural memory
• PSG (Production System, version G) - 1973– First “widely” used rule language
4
Descendants of PSG
5
OPS1-5Forgy, McDermott, Newell, 1976-1981
OPS83Forgy
OPSJForgy
CLIPSRiley ‘86
JESSFriedman-Hill,
‘95
Drools2001
PSGNewell,
‘73
PRISMLangley
‘80
CAPSJust, Carpenter
‘87
ART‘84
ACTAnderson
‘76Used for Cognitive Modeling
Application Oriented
SoarLaird et al,
‘82
Jrules1996
Cognitive Architecture• Fixed structures that support cognitive functions
– Representations of information– Memories– Processing units – Interfaces to perception and motor system
• Functional requirements– Supports end-to-end behavior, reactivity, decision making,
planning, learning, … – Relevant knowledge is brought to bear when needed– No “escape” to arbitrary programming
6
Different Goals of Cognitive Architecture Research
• Biological Modeling: Capture what we know about the brain– LEABRA
• Psychological Modeling: Capture what we know about the mind – the details of human performance in a wide range of cognitive tasks– ACT-R, CAPS, CLARION, EPIC, Soar, …
• Functionality: Support the broad range of human cognitive capabilities across wide range of tasks– Companions, ICARUS, LIDA, Soar, …
• Different requirements than in commercial rule applications!7
Soar(Laird, Newell & Rosenbloom, 1982)
• Cognitive architecture for human-level intelligence• Multi-method, multi-task problem solving• Combined problem spaces & production systems• Direct Lineage: OPS5, XAPS2, GPS, LT
8
Soar 9 Structure
Symbolic Long-Term MemoriesSemantic
Symbolic Working Memory
Visual LT Memory
Body
Procedural
Dec
isio
n Pr
oced
ure
Chunking
Episodic
App
rais
als
ReinforcementLearning
Perception Action
SemanticLearning
EpisodicLearning
Mental Imagery
Perceptual STM
9
Original Soar Structure
Symbolic Long-Term Memories
Symbolic Working Memory
Procedural
Dec
isio
n Pr
oced
ure
Body
Perception Action
Chunking
10
Rule-based System Processing Cycle
11
Rule Memory
Working Memory
Select Rule(conflict resolution)
Input OutputMatch Rules To WM
Execute Rule Actions
Rule-based System Processing Cycle
12
Select Rule(conflict resolution)
Input OutputMatch Rules To WM
Execute Rule Actions
1. Conditionality is limited to determining which rules match.2. Cannot use additional task knowledge to select one rule
• Rely on fixed conflict resolution schemes using syntactic features• Works about 80%• Forced to “engineer” conditions to avoid conflicts
3. Can only associate fixed set of actions with each rule4. Intelligence is all about flexibility: being able to use knowledge for
decision making and acting
CHALLENGE: How to open up processing cycle to more knowledge
Rule-Based System Processing CycleAlternative Processing Cycle
13
• Introduce OPERATORS as basic unit of deliberation.• Explicitly represent current operator
• Rules contain knowledge that• Propose Operators: create acceptable preferences• Evaluate Operators: create preferences• Apply Operator: modify working memory
• All rules that match fire in parallel
Fire rules to Evaluate operators
Input OutputFire rules to
Apply selectedoperator
OutputFire rules to
Proposeoperators
DecideSelect Rule(conflict resolution)
Input OutputMatch Rules To WM
Execute Rule Actions
Soar 101: Eaters
East
SouthNorth
ProposeOperator
EvaluateOperators
North > EastSouth > East
North = South
ApplyOperator OutputInput Select
Operator
If cell in direction <d> is not a wall, --> propose operator move <d>
If operator <o1> will move to a bonus food and operator <o2> will move to a normal food, --> operator <o1> > <o2>
If an operator is selected to move <d>--> create outputmove-direction <d>
InputProposeOperator
EvaluateOperators
SelectOperator
ApplyOperator Output
If operator <o1> will move to a empty cell--> operator <o1> <
North > EastSouth <
move-direction North
If operator <o1> will move to a cell with same food as <o2>--> operator <o1> = <o2>
Problem Search & Knowledge Search
15
State
OperatorsProblem Search• Propose legal operators• Select best for situation• Generates new structures
Knowledge Search• Find relevant knowledge for operator proposal, selection,
application• Searches through known structures (rule conditions)• Inner loop of behavior: must be bounded to maintain
reactivity
Fire rules to Evaluate operators
Input OutputFire rules to
Apply selectedoperator
OutputFire rules to
Proposeoperators
Decide
Two Related Questions
1. How use complex reasoning/knowledge for operator selection and application knowledge?
– Without introducing a new formalism– While still maintaining reactivity
2. What happens when knowledge search fails: – Insufficient knowledge (encoded as rules) to
propose, evaluate, or apply operators?
Fire rules to Evaluate operators
Input OutputFire rules to
Apply selectedoperator
OutputFire rules to
Proposeoperators
Decide
16
Impasses and Substates
• If knowledge search fails (incomplete or conflicting knowledge is retrieved)– Generate a substate – Recursively use operators to reason about impasse
• Leads to internal planning, hierarchical task decomposition, …
• One form of meta-cognitive processing
Fire rules to Evaluate operators
Input OutputFire rules to
Apply selectedoperator
OutputFire rules to
Proposeoperators
Decide
17
One-step Look-ahead
18
(on A Table)(on B Table)(on C A)
A B
C
move(C, B)
move(B, C)
move(C, Table)
Evaluate(move(C, Table))
(on A Table)(on B Table)(on C A)
move(C, Table)
Tie Impasse
(on A Table)(on B Table)(on C Table)
Evaluation = 1
Evaluate(move(C, B)) Evaluate(move(B,C))
copy
Evaluation = 0 Evaluation = 0Evaluation = 1
Prefer move(C, Table)
A B C
A
B
C
Goal
TacAir-Soar: 1997
Controls simulated aircraft in real-time training exercises (>3000 entities)
Flies all U.S. air missions
Dynamically changes missions as appropriate
Communicates and coordinates with computer and human controlled planes
>8000 rules
19
TacAir-Soar Task Decomposition
AchieveProximity
EmployWeapons Search Execute
TacticScram
Get MissileLAR
SelectMissile
Get SteeringCircle
SortGroup
LaunchMissile
Lock Radar Lock IR Fire-Missile Wait-forMissile-Clear
If intercepting an enemy andthe enemy is within range ROE are met thenpropose employ-weapons
EmployWeapons
If employing-weapons andmissile has been selected andthe enemy is in the steering circle and LAR has been achieved, then propose launch-missile
LaunchMissile
If launching a missile andit is an IR missile and there is currently no IR lockthen propose lock-IR
Lock IR
Execute Mission
Fly-route GroundAttackFly-Wing Intercept
If instructed to intercept an enemy then propose intercept
Intercept
20
Soar QuakebotAnticipation of Enemy Actions
Urban CombatTransfer Learning
MOUTbotTeam Tactics and
Unpredictable Behavior
SORTSSpatial Reasoning & Real-
time Strategy
TacAir-SoarComplex Doctrine & Tactics
Execution
HauntActors and Automated
Direction
Splinter-SoarRobot Control
Simulated Scout Mental Imagery
Amber EPIC-SoarModeling Human-
Computer Interaction
NL-SoarNatural Language
Processing
The horse raced past the barn fell
R1-SoarComputer Configuration
21
Beyond Rules: How to Use and LearnOther Forms of Knowledge?
• Statistical regularities related to success/failure– Reinforcement Learning
• Factual Knowledge– Semantic Memory
• Memory of Experiences– Episodic Memory
• Spatial and Visual Knowledge– Mental Imagery
22
Reinforcement Learning in Soar
• Get reward from environment– Scalar value for good/bad results
• Operator evaluation rules encode expected value of operator– If operator A is selected in situation B, expected
value is X.• Each time an operator is applied, update
evaluation rules based on reward and reward expected for next state.
23
Soar 9 Structural Diagram
Symbolic Long-Term Memories
Symbolic Working Memory
Procedural
Dec
isio
n Pr
oced
ure
Body
Perception Action
ChunkingReinforcementLearning
24
moveright/left/stay
jumpyes/no
speedHigh/low
move right
grab coin
search block
avoid pit
tackle monster
Functional Level Operators
Key-stroke Level Operators
Performance of Soar RL agent and Sample Agent Operator Hierarchy
Infinite Mario World
Input• type of tiles in the visual scene (brick, coin etc.)• monsters - location, horizontal/vertical velocity, type, wings
Output• primitive actions: left or right movement, jump, high/low speed
Relational State Representation• of the form - ‘there exists a pit at a distance of three tiles in
horizontal direction’ • qualitative attributes – ‘isthreat’ for monsters and pits,
‘isreachable’ for coins and blocks
Agent Design
Interactive Hierarchical Reinforcement Learning: Shiwali Mohan
0
20
40
60
80
100
120
140
160
1 201 401 601 801 1001 1201 1401 1601 180125
Which card is higher?
26
There is a King on the card with the name of a movie in which this state’s governor starred. The other is a six.
RamboTerminator
Semantic Memory• Long-term declarative store of facts
– Knowledge acquired by agent– Preload from external databases (Cyc, Wordnet, …)
• Why not just use working memory?– Slows down significantly as amount of data increases
• Why not rules?– Requires a rule for every way data might be accessed
• Our implementation– Statistically-based query optimization– SQL-Lite backend
27
Soar 9 Structure
Symbolic Long-Term Memories
Symbolic Working Memory
Procedural
Dec
isio
n Pr
oced
ure
Body
Perception Action
ChunkingReinforcementLearning
Semantic
SemanticLearning
28
Long-term Procedural MemoryProduction Rules
Soar Semantic Memory
29
Input
Cue
Retrieved
Working Memory
Cue
Retrieved
Working Memory
Long-term Procedural MemoryProduction Rules
Soar Semantic Memory
30
SemanticMemory
SemanticLearning
Cue
Retrieved
Working Memory
Long-term Procedural MemoryProduction Rules
Soar Semantic Memory
31
SemanticMemory
SemanticLearning
WordNet Results(Nate Derbinsky)
• Soar (WN-Lexical, v3)– 821K nodes– 3.8M edges– 398.1MB on disk
• ACT-R– 240k nodes– 2.2M edges– ~40 msec. retrieval
• 1-4 cue size
Query Type Cue Size Avg. Retrieval Time (msec.) Std. Deviation
100 random 1 0.211 0.0135
5 word senses 2 0.222 0.0107
“soar” 7 0.277 0.0017
*Intel 2.8GHz Core 2 Duo, 4GB RAM, Mac OS 10.6, SQLite v3
32
Episodic Memory• Long-term declarative store of experiences
– What you “remember”
• Why not just use working memory?– Significant slow down as experiences accumulate– Difficult to support biased partial match retrievals
• Our implementation– Separate Memory
• Store only WM changes to episodic memory• Reconstruct episodes
– B+ trees provide access to SQL-Lite backend34
Soar 9 Structure
Symbolic Long-Term Memories
Symbolic Working Memory
Procedural
Dec
isio
n Pr
oced
ure
Body
Perception Action
ChunkingReinforcementLearning
Semantic
SemanticLearning
Episodic
EpisodicLearning
35
Long-term Procedural MemoryProduction Rules
Soar Episodic Memory
36
Working memory is stored in the episode
Input
Output Cue
Retrieved
Working Memory
Long-term Procedural MemoryProduction Rules
Soar Episodic Memory
37
Episodes are stored in a separate memory
Input
Output Cue
Retrieved
Working Memory
EpisodicMemory
EpisodicLearning
EpisodicMemory
Long-term Procedural MemoryProduction Rules
Soar Episodic Memory
38
The closest partial match is retrieved.
Input
Output Cue
Retrieved
Working Memory
EpisodicLearning
Episodic Memory:Multi-Step Action Projection
(Andrew Nuxoll)
• Learn tactics from prior success and failure– Fight/flight– Back away from enemy (and fire)– Dodging
-30
-20
-10
0
10
20
30
40
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171M
argi
n of
Vic
tory
Successive Games
Average Margin of Victory
39
Episodic Memory Implementation(Nate Derbinsky)
• Practical performance for real-world tasks – 1 million episodes (>2,500 features/episode)
*Intel 2.8GHz Core 2 Duo, 4GB RAM, Mac OS 10.6, SQLite v3
Storage Cue Matching Reconstruction
2.68ms, 1620MB 57.6ms 22.65ms
40
Spatial Problem Solving via Mental Imagery
• Non-symbolic processing of images and spatial representations
• Controlled by operators (and rules) that manipulate symbolic structures
42
Soar 9 Structure
Symbolic Long-Term MemoriesSemantic
Symbolic Working Memory
Visual LT Memory
Body
Procedural
Dec
isio
n Pr
oced
ure
Chunking
Episodic
App
rais
als
ReinforcementLearning
Perception Action
SemanticLearning
EpisodicLearning
Mental Imagery
Perceptual STM
43
Spatial Scene
Spatial Problem Solving with Mental Imagery[Scott Lathrop & Sam Wintermute]
Working Memory
imagine_move(fish1)imagine_move(fish2)
intersect(frog’,fish1’)no_intersect(frog′)
imagine_action(right)action(right)
fish1
fish2 frog
fish1’
fish2’
frog’
imagine_action(up)
frog’
obstacle(fish1)obstacle(fish2)agent(frog)
45
Spatial Scene
Imagery in Frogger IIWorking Memory
action(right)
fish1
fish2 frog
fish1’
fish2’ frog’
46
Frogger II Results[Sam Wintermute]
• Integrated with reinforcement learning using one-step look-ahead with and without imagery
• Reward function:+10 for moving up a row-10 for moving down a row1000 for reaching top-1000 for dying-1 at each time step
• Starts knowing only available operators– Up, down, left, right– Learns when to select them
• Final performance– Imagery agent won 70% – No-imagery won 45%
with imagery
without imagery
47
Summary of Knowledge/Structures• Skills and Procedural Knowledge
– How and when to do things– Rules (operators)– Learned by chunking and reinforcement learning
• Long-term Facts– What you “know” – Semantic memory
• Memory of past experiences– What you “remember”– Episodic memory
• Spatial and visual information processing– What you experience in your “mind’s eye”– Mental Imagery
48
AcknowledgmentsNate Derbinsky, Nick Gorski, Scott Lathrop, Shiwali Mohan, Shelley Nason, Andrew Nuxoll, Miller Tinkerhess, Jon Voigt, Yongjia Wang, Sam Wintermute, Joseph Xu
Soar AvailabilityFree: open source, BSD license
– Manuals, Tutorials, Editors, Debuggers, … – Interfaces to C, C++, Java, Python, TCL, …
Soar (C): http://sitemaker.umich.edu/soar/home– Supported by University of Michigan
Soar (Java): http://code.google.com/p/jsoar/– Translated from C version and supported by David Ray
49