learning procedural planning knowledge in complex environments douglas pearson...

21
Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson [email protected] t March 2004

Upload: kylie-mccallum

Post on 27-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Learning Procedural Planning Knowledge in Complex Environments

Douglas Pearson

[email protected]

March 2004

Page 2: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Characterizing the Learner

DeliberateImplicitMethod

KR

Declarative

Procedural

Simpler AgentsWeak, slower learning

Complex AgentsStrong, faster learning

Complex EnvironmentsActions: Duration & ConditionalSensing: Limited, noisy, delayedTask : Timely responseDomain: Change over time large state space

Simple EnvironmentsSymbolicLearners

ReinforcementLearning

IMPROV

Page 3: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Why Limit Knowledge Access?

• Procedural – Only access by executing• Declarative – Can answer when will execute/what it will do.

Declarative Problems• Availability

– If (x^5 + 3x^3 – 5x^2 +2) > 7 then Action– Chains of rules A->B->C->Action

• Efficiency– O(size of knowledge base) or worse– Agent slows down as learns more

IMPROV Representation– Sets of production rules for operator preconditions and actions– Assume learner can only execute rules– But allow ability to add declarative knowledge when it’s efficient to do so.

Page 4: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Focusing on Part of the Problem

TaskPerformance

0%

100%

Knowledge

RepresentationInitial

Rule Base

Learn thisDomain Knowledge

Page 5: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

The Problem

• Cast learning problem as– Error detection (incomplete/incorrect K)

– Error correction (fixing or adding K)

• But with just limited, procedural access

• Aim is to support learning in complex, scalable agents/environments.

Page 6: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Error Detection Problem

S1Speed-30

S2Speed-10

S3Speed-0

S4Speed-30

Existing(PossiblyIncorrect)

Knowledge

PLAN

How to monitor the plan during execution without direct knowledge access?

Page 7: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Error Detection Solution• Direct monitoring – not possible

• Instead detect lack of progress to the goal

– No rules matching or conflicting rules

S1Speed-30

S2Speed-10

S3Speed-0

S4Engine stallsNo proposal

• Not predicting behavior of the world (useful in stochastic environments)• But no implicit notion of quality of solution• Can add domain specific error conditions – but not required.

Page 8: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

IMPROV’s Recovery Method

Search

Learning

Identify Incorrect Operator(s)

Train Inductive Learner

Change Domain Knowledge

Replan

Execute

Record[State,Op -> Result]

Repeat untilfind goal

FailReached

Goal

Page 9: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Finding the Incorrect Operator(s)

Speed-30 Speed-10 Speed-0 Speed-30

Speed-30 Speed-10 Speed-0 Speed-30Change-Gear

Change-Gear is over-specificSpeed-0 is over-general

By waiting can dobetter credit assignment

Page 10: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Learning to Correct the Operator

• Collected a set of training instances– [State, Operator -> Result]

– Can identify differences between states

Speed = 40Light = greenSelf = carOther = car

Speed = 40Light = greenSelf = carOther = ambulance

• Used as a default bias in training inductive learner• Learn preconditions as classification problem (predict operator from state)

Page 11: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

K-Incremental Learning

• Collect a set of k instances

• Then train inductive learner

ReinforcementLearners

Till Correction(IMPROV)

Till Unique Cause(EXPO)

Non-IncrementalLearners

1 k1 k2 n

K-Incremental Learner– k does not grow over time => incremental behavior– Better decisions about what to discard when generalizing– When doing “active learning” bad early learning can really hurt

Instance set size

Page 12: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Extending to Operator Actions

Speed 30 Speed 0 Speed 20

Speed 30

Decompose intooperator hierarchy

Speed 0 Speed 20

Brake Release

Slow -5 Slow -10 Slow -10 Slow 0

Terminates with operators that modify a single symbol

Page 13: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Correcting Actions

Slow -5 Slow -10 Slow -10

Expected effectsof braking

Slow -2 Slow -4 Slow -6

Observed effectsof braking on ice

=> Failure

Use the correction method to change thepre-conditions of these sub-operators

Page 14: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Change Procedural Actions

Brake

Changing effectsof brake

Braking & slow=0 & ice=>reject slow -5

Braking & slow=0 & ice=>propose slow -2

SpecializeSlow -5

GeneralizeSlow -2

Supports Complex ActionsActions with durations (sequence of operators)Conditional actions (branches in sequence of operators)Multiple simultaneous effects

Page 15: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

IMPROV Summary

DeliberateImplicitMethod

KR

DeclarativeNon-Incremental

ProceduralIncremental

SymbolicLearners

ReinforcementLearning

IMPROV

IMPROV support for:• Powerful agents -- Multiple goals -- Faster, deliberate learning• Complex environments -- Noise -- Complex actions -- Dynamic environments

k-Incremental Learning -- Improved credit assignment -- Which operator -- Which feature

General weak deliberate learner with only procedural access assumed -- General purpose error detection -- General correction method applied to preconditions and actions -- Nice re-use of precondition learner to learn actions -- Easy to add domain specific knowledge to make method stronger

Page 16: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

Redux: Diagram-based Example-driven Knowledge Acquisition

Douglas Pearson

[email protected]

March 2004

Page 17: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

1. User specifies desired behavior

Page 18: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

2. User selects features – define rules

Later we’ll use ML to guess this initial feature set

Page 19: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

3. Compare desired with rules

Desired

Actual

Move-through(door1) Turn-to-face(threat1) Shoot(threat1)

Move-through(door1) Turn-to-face(neutral1) Shoot(neutral1)

Page 20: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

4. Identify and correct problems

• Detect differences between desired behavior and rules– Detect overgeneral preconditions

– Detect conflicts within the scenario

– Detect conflicts between scenarios

– Detect choice points where there’s no guidance

– etc. etc.

• All of these errors are detected

automatically when rule is created

Page 21: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

5. Fast rule creation by expert

Expert Engineer

Library of validated behavior examples

A -> BC -> D

E, J -> FG, A, C -> H

E, G -> IJ, K -> L

ExecutableCode

Analysis & generation tools

Detect inconsistency

Generalize

Generaterules

Simulate execution

SimulationEnvironment

Define behavior withdiagram-based examples