john laird – soar rules · pdf filesoar rules john e. laird john l. tishman professor of...

48
Soar Rules John E. Laird John L. Tishman Professor of Engineering University of Michigan Research funded by TARDEC, ONR, AFOSR The Role of Production Rules in a General Cognitive Architecture

Upload: dangkhuong

Post on 06-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Soar Rules

John E. LairdJohn L. Tishman Professor of Engineering

University of MichiganResearch funded by TARDEC, ONR, AFOSR

The Role of Production Rules in a General Cognitive Architecture

Origins

“The production system was one of those happy events, though in minor key, that historians of science often talk about: a rather well-prepared formalism, sitting in wait for a scientific mission. Production systems have a long and diverse history. Their use of symbolic logic starts with Post (1943), from whom the name is taken. They also show up as Markov algorithms (1954). Their use in linguistics, where they are also called rewrite rules, dates from Chomsky (1957). As with so many other notions in computer science, they really entered into wide currency when they became operationalized in programming languages.”

(Newell and Simon, 1972, p. 889).2

Emil Post 1943

• Post production systems– P1: $$ -> * – P2: *$ -> * – P3: *x -> x* – P4: * -> null & halt – P5: $xy -> y$x– P6: null -> $

3

Newell & Simon (1965)• Proposed post-production systems as basis for programming languages• Outgrowth of research on human problem solving

– Logic, Chess, Crypta-arithmetic: SEND + MORE = MONEY• Strengths of rule-based systems

1. Computational generality (Turing Equivalent)2. Independent units/components of behavior – add/remove easily3. Models human-short-term memory and human procedural memory

• PSG (Production System, version G) - 1973– First “widely” used rule language

4

Descendants of PSG

5

OPS1-5Forgy, McDermott, Newell, 1976-1981

OPS83Forgy

OPSJForgy

CLIPSRiley ‘86

JESSFriedman-Hill,

‘95

Drools2001

PSGNewell,

‘73

PRISMLangley

‘80

CAPSJust, Carpenter

‘87

ART‘84

ACTAnderson

‘76Used for Cognitive Modeling

Application Oriented

SoarLaird et al,

‘82

Jrules1996

Cognitive Architecture• Fixed structures that support cognitive functions

– Representations of information– Memories– Processing units – Interfaces to perception and motor system

• Functional requirements– Supports end-to-end behavior, reactivity, decision making,

planning, learning, … – Relevant knowledge is brought to bear when needed– No “escape” to arbitrary programming

6

Different Goals of Cognitive Architecture Research

• Biological Modeling: Capture what we know about the brain– LEABRA

• Psychological Modeling: Capture what we know about the mind – the details of human performance in a wide range of cognitive tasks– ACT-R, CAPS, CLARION, EPIC, Soar, …

• Functionality: Support the broad range of human cognitive capabilities across wide range of tasks– Companions, ICARUS, LIDA, Soar, …

• Different requirements than in commercial rule applications!7

Soar(Laird, Newell & Rosenbloom, 1982)

• Cognitive architecture for human-level intelligence• Multi-method, multi-task problem solving• Combined problem spaces & production systems• Direct Lineage: OPS5, XAPS2, GPS, LT

8

Soar 9 Structure

Symbolic Long-Term MemoriesSemantic

Symbolic Working Memory

Visual LT Memory

Body

Procedural

Dec

isio

n Pr

oced

ure

Chunking

Episodic

App

rais

als

ReinforcementLearning

Perception Action

SemanticLearning

EpisodicLearning

Mental Imagery

Perceptual STM

9

Original Soar Structure

Symbolic Long-Term Memories

Symbolic Working Memory

Procedural

Dec

isio

n Pr

oced

ure

Body

Perception Action

Chunking

10

Rule-based System Processing Cycle

11

Rule Memory

Working Memory

Select Rule(conflict resolution)

Input OutputMatch Rules To WM

Execute Rule Actions

Rule-based System Processing Cycle

12

Select Rule(conflict resolution)

Input OutputMatch Rules To WM

Execute Rule Actions

1. Conditionality is limited to determining which rules match.2. Cannot use additional task knowledge to select one rule

• Rely on fixed conflict resolution schemes using syntactic features• Works about 80%• Forced to “engineer” conditions to avoid conflicts

3. Can only associate fixed set of actions with each rule4. Intelligence is all about flexibility: being able to use knowledge for

decision making and acting

CHALLENGE: How to open up processing cycle to more knowledge

Rule-Based System Processing CycleAlternative Processing Cycle

13

• Introduce OPERATORS as basic unit of deliberation.• Explicitly represent current operator

• Rules contain knowledge that• Propose Operators: create acceptable preferences• Evaluate Operators: create preferences• Apply Operator: modify working memory

• All rules that match fire in parallel

Fire rules to Evaluate operators

Input OutputFire rules to

Apply selectedoperator

OutputFire rules to

Proposeoperators

DecideSelect Rule(conflict resolution)

Input OutputMatch Rules To WM

Execute Rule Actions

Soar 101: Eaters

East

SouthNorth

ProposeOperator

EvaluateOperators

North > EastSouth > East

North = South

ApplyOperator OutputInput Select

Operator

If cell in direction <d> is not a wall, --> propose operator move <d>

If operator <o1> will move to a bonus food and operator <o2> will move to a normal food, --> operator <o1> > <o2>

If an operator is selected to move <d>--> create outputmove-direction <d>

InputProposeOperator

EvaluateOperators

SelectOperator

ApplyOperator Output

If operator <o1> will move to a empty cell--> operator <o1> <

North > EastSouth <

move-direction North

If operator <o1> will move to a cell with same food as <o2>--> operator <o1> = <o2>

Problem Search & Knowledge Search

15

State

OperatorsProblem Search• Propose legal operators• Select best for situation• Generates new structures

Knowledge Search• Find relevant knowledge for operator proposal, selection,

application• Searches through known structures (rule conditions)• Inner loop of behavior: must be bounded to maintain

reactivity

Fire rules to Evaluate operators

Input OutputFire rules to

Apply selectedoperator

OutputFire rules to

Proposeoperators

Decide

Two Related Questions

1. How use complex reasoning/knowledge for operator selection and application knowledge?

– Without introducing a new formalism– While still maintaining reactivity

2. What happens when knowledge search fails: – Insufficient knowledge (encoded as rules) to

propose, evaluate, or apply operators?

Fire rules to Evaluate operators

Input OutputFire rules to

Apply selectedoperator

OutputFire rules to

Proposeoperators

Decide

16

Impasses and Substates

• If knowledge search fails (incomplete or conflicting knowledge is retrieved)– Generate a substate – Recursively use operators to reason about impasse

• Leads to internal planning, hierarchical task decomposition, …

• One form of meta-cognitive processing

Fire rules to Evaluate operators

Input OutputFire rules to

Apply selectedoperator

OutputFire rules to

Proposeoperators

Decide

17

One-step Look-ahead

18

(on A Table)(on B Table)(on C A)

A B

C

move(C, B)

move(B, C)

move(C, Table)

Evaluate(move(C, Table))

(on A Table)(on B Table)(on C A)

move(C, Table)

Tie Impasse

(on A Table)(on B Table)(on C Table)

Evaluation = 1

Evaluate(move(C, B)) Evaluate(move(B,C))

copy

Evaluation = 0 Evaluation = 0Evaluation = 1

Prefer move(C, Table)

A B C

A

B

C

Goal

TacAir-Soar: 1997

Controls simulated aircraft in real-time training exercises (>3000 entities)

Flies all U.S. air missions

Dynamically changes missions as appropriate

Communicates and coordinates with computer and human controlled planes

>8000 rules

19

TacAir-Soar Task Decomposition

AchieveProximity

EmployWeapons Search Execute

TacticScram

Get MissileLAR

SelectMissile

Get SteeringCircle

SortGroup

LaunchMissile

Lock Radar Lock IR Fire-Missile Wait-forMissile-Clear

If intercepting an enemy andthe enemy is within range ROE are met thenpropose employ-weapons

EmployWeapons

If employing-weapons andmissile has been selected andthe enemy is in the steering circle and LAR has been achieved, then propose launch-missile

LaunchMissile

If launching a missile andit is an IR missile and there is currently no IR lockthen propose lock-IR

Lock IR

Execute Mission

Fly-route GroundAttackFly-Wing Intercept

If instructed to intercept an enemy then propose intercept

Intercept

20

Soar QuakebotAnticipation of Enemy Actions

Urban CombatTransfer Learning

MOUTbotTeam Tactics and

Unpredictable Behavior

SORTSSpatial Reasoning & Real-

time Strategy

TacAir-SoarComplex Doctrine & Tactics

Execution

HauntActors and Automated

Direction

Splinter-SoarRobot Control

Simulated Scout Mental Imagery

Amber EPIC-SoarModeling Human-

Computer Interaction

NL-SoarNatural Language

Processing

The horse raced past the barn fell

R1-SoarComputer Configuration

21

Beyond Rules: How to Use and LearnOther Forms of Knowledge?

• Statistical regularities related to success/failure– Reinforcement Learning

• Factual Knowledge– Semantic Memory

• Memory of Experiences– Episodic Memory

• Spatial and Visual Knowledge– Mental Imagery

22

Reinforcement Learning in Soar

• Get reward from environment– Scalar value for good/bad results

• Operator evaluation rules encode expected value of operator– If operator A is selected in situation B, expected

value is X.• Each time an operator is applied, update

evaluation rules based on reward and reward expected for next state.

23

Soar 9 Structural Diagram

Symbolic Long-Term Memories

Symbolic Working Memory

Procedural

Dec

isio

n Pr

oced

ure

Body

Perception Action

ChunkingReinforcementLearning

24

moveright/left/stay

jumpyes/no

speedHigh/low

move right

grab coin

search block

avoid pit

tackle monster

Functional Level Operators

Key-stroke Level Operators

Performance of Soar RL agent and Sample Agent Operator Hierarchy

Infinite Mario World

Input• type of tiles in the visual scene (brick, coin etc.)• monsters - location, horizontal/vertical velocity, type, wings

Output• primitive actions: left or right movement, jump, high/low speed

Relational State Representation• of the form - ‘there exists a pit at a distance of three tiles in

horizontal direction’ • qualitative attributes – ‘isthreat’ for monsters and pits,

‘isreachable’ for coins and blocks

Agent Design

Interactive Hierarchical Reinforcement Learning: Shiwali Mohan

0

20

40

60

80

100

120

140

160

1 201 401 601 801 1001 1201 1401 1601 180125

Which card is higher?

26

There is a King on the card with the name of a movie in which this state’s governor starred. The other is a six.

RamboTerminator

Semantic Memory• Long-term declarative store of facts

– Knowledge acquired by agent– Preload from external databases (Cyc, Wordnet, …)

• Why not just use working memory?– Slows down significantly as amount of data increases

• Why not rules?– Requires a rule for every way data might be accessed

• Our implementation– Statistically-based query optimization– SQL-Lite backend

27

Soar 9 Structure

Symbolic Long-Term Memories

Symbolic Working Memory

Procedural

Dec

isio

n Pr

oced

ure

Body

Perception Action

ChunkingReinforcementLearning

Semantic

SemanticLearning

28

Long-term Procedural MemoryProduction Rules

Soar Semantic Memory

29

Input

Cue

Retrieved

Working Memory

Cue

Retrieved

Working Memory

Long-term Procedural MemoryProduction Rules

Soar Semantic Memory

30

SemanticMemory

SemanticLearning

Cue

Retrieved

Working Memory

Long-term Procedural MemoryProduction Rules

Soar Semantic Memory

31

SemanticMemory

SemanticLearning

WordNet Results(Nate Derbinsky)

• Soar (WN-Lexical, v3)– 821K nodes– 3.8M edges– 398.1MB on disk

• ACT-R– 240k nodes– 2.2M edges– ~40 msec. retrieval

• 1-4 cue size

Query Type Cue Size Avg. Retrieval Time (msec.) Std. Deviation

100 random 1 0.211 0.0135

5 word senses 2 0.222 0.0107

“soar” 7 0.277 0.0017

*Intel 2.8GHz Core 2 Duo, 4GB RAM, Mac OS 10.6, SQLite v3

32

Which card should I pick to get the highest value?

33The cards are the same as last time.

Episodic Memory• Long-term declarative store of experiences

– What you “remember”

• Why not just use working memory?– Significant slow down as experiences accumulate– Difficult to support biased partial match retrievals

• Our implementation– Separate Memory

• Store only WM changes to episodic memory• Reconstruct episodes

– B+ trees provide access to SQL-Lite backend34

Soar 9 Structure

Symbolic Long-Term Memories

Symbolic Working Memory

Procedural

Dec

isio

n Pr

oced

ure

Body

Perception Action

ChunkingReinforcementLearning

Semantic

SemanticLearning

Episodic

EpisodicLearning

35

Long-term Procedural MemoryProduction Rules

Soar Episodic Memory

36

Working memory is stored in the episode

Input

Output Cue

Retrieved

Working Memory

Long-term Procedural MemoryProduction Rules

Soar Episodic Memory

37

Episodes are stored in a separate memory

Input

Output Cue

Retrieved

Working Memory

EpisodicMemory

EpisodicLearning

EpisodicMemory

Long-term Procedural MemoryProduction Rules

Soar Episodic Memory

38

The closest partial match is retrieved.

Input

Output Cue

Retrieved

Working Memory

EpisodicLearning

Episodic Memory:Multi-Step Action Projection

(Andrew Nuxoll)

• Learn tactics from prior success and failure– Fight/flight– Back away from enemy (and fire)– Dodging

-30

-20

-10

0

10

20

30

40

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171M

argi

n of

Vic

tory

Successive Games

Average Margin of Victory

39

Episodic Memory Implementation(Nate Derbinsky)

• Practical performance for real-world tasks – 1 million episodes (>2,500 features/episode)

*Intel 2.8GHz Core 2 Duo, 4GB RAM, Mac OS 10.6, SQLite v3

Storage Cue Matching Reconstruction

2.68ms, 1620MB 57.6ms 22.65ms

40

Which ball should I drop the ball to get one in the goal?

41

Goal

A B C

Spatial Problem Solving via Mental Imagery

• Non-symbolic processing of images and spatial representations

• Controlled by operators (and rules) that manipulate symbolic structures

42

Soar 9 Structure

Symbolic Long-Term MemoriesSemantic

Symbolic Working Memory

Visual LT Memory

Body

Procedural

Dec

isio

n Pr

oced

ure

Chunking

Episodic

App

rais

als

ReinforcementLearning

Perception Action

SemanticLearning

EpisodicLearning

Mental Imagery

Perceptual STM

43

Spatial Scene

Spatial Problem Solving with Mental Imagery[Scott Lathrop & Sam Wintermute]

Working Memory

imagine_move(fish1)imagine_move(fish2)

intersect(frog’,fish1’)no_intersect(frog′)

imagine_action(right)action(right)

fish1

fish2 frog

fish1’

fish2’

frog’

imagine_action(up)

frog’

obstacle(fish1)obstacle(fish2)agent(frog)

45

Spatial Scene

Imagery in Frogger IIWorking Memory

action(right)

fish1

fish2 frog

fish1’

fish2’ frog’

46

Frogger II Results[Sam Wintermute]

• Integrated with reinforcement learning using one-step look-ahead with and without imagery

• Reward function:+10 for moving up a row-10 for moving down a row1000 for reaching top-1000 for dying-1 at each time step

• Starts knowing only available operators– Up, down, left, right– Learns when to select them

• Final performance– Imagery agent won 70% – No-imagery won 45%

with imagery

without imagery

47

Summary of Knowledge/Structures• Skills and Procedural Knowledge

– How and when to do things– Rules (operators)– Learned by chunking and reinforcement learning

• Long-term Facts– What you “know” – Semantic memory

• Memory of past experiences– What you “remember”– Episodic memory

• Spatial and visual information processing– What you experience in your “mind’s eye”– Mental Imagery

48

AcknowledgmentsNate Derbinsky, Nick Gorski, Scott Lathrop, Shiwali Mohan, Shelley Nason, Andrew Nuxoll, Miller Tinkerhess, Jon Voigt, Yongjia Wang, Sam Wintermute, Joseph Xu

Soar AvailabilityFree: open source, BSD license

– Manuals, Tutorials, Editors, Debuggers, … – Interfaces to C, C++, Java, Python, TCL, …

Soar (C): http://sitemaker.umich.edu/soar/home– Supported by University of Michigan

Soar (Java): http://code.google.com/p/jsoar/– Translated from C version and supported by David Ray

49