pat langley school of computing and informatics arizona state university tempe, arizona usa...
Post on 27-Mar-2015
221 Views
Preview:
TRANSCRIPT
Pat Langley
School of Computing and InformaticsArizona State University
Tempe, Arizona USA
Institute for the Study of Learning and ExpertisePalo Alto, California USA
Challenges in Learning Plan Knowledge
Thanks to D. Choi, T. Konik, U. Kutur, N. Li, D. Nau, N. Nejati, and D. Shapiro for their many contributions. This talk reports research funded by grants from DARPA IPTO, which is not responsible for its contents.
Outline of the Talk
1. Brief review of learning plan knowledge
2. Learning from different sources
3. Learning for new performance tasks
4. Learning in different scenarios
5. Learning with novel representations
6. Some responses to these challenges
7. Concluding remarks
The Problem: Learning Plan Knowledge
Given: Basic knowledge about some action-oriented domain. (e.g., state/goal representation, operators)
Given: A set of training problems (e.g., initial states, goals, and possibly more)
Given: Some performance task that the system must carry out.
Given: A performance mechanism that can use knowledge to carry out that task.
Learn: Knowledge that will let the system improve its ability to perform new tasks from the same or similar
domain.
Topics Not Covered
This talk will range widely, but I will not cover issues related to:
Learning with impoverished representations
Interested in human-like, intelligent behavior
Most work on reinforcement learning is irrelevant
Acquiring basic knowledge about domain
Interested in building on such knowledge
Most work on learning action models is too basic
Nonincremental learning from large data sets
Interested in human-like incremental learning
This rules out most data-mining approaches
Historical Topics
There has been a long history of work on learning plan knowledge: Forming macro-operators
Fikes et al. (1972), Iba (1988), Mooney (1989), Botea et al. (2005)
Inducing forward-chaining control rules Anzai & Simon (1978) Mitchell et al. (1981), Langley (1982)
Learning control rules analytically Laird et al. (1986), Mitchell et al. (1986), Minton (1988)
Problem solving by analogy Veloso (1994), Jones & Langley (1995), VanLehn & Jones (1994)
Inducing control rules for partial-order plans Kautukam & Kambhampati (1994), Estlin & Mooney (1997)
Historical Trends
Work on learning plan knowledge has seen many shifts in fashion:
Early hope for improving problem solvers/planners (19781985)
Excitement/confusion introduced by EBL movement (19861992)
Some doubts raised by the “utility problem” (19881993)
Mass migration to reinforcement learning paradigm (19932003)
Resurgence of interest in learning plan knowledge (2004present)
Throughout these changes, the problems and potential of learning plan knowledge have remained.
Traditional Sources of Information
Most research on learning for planning has assumed the system uses search to generate:
Successful paths that achieve the goals (positive instances)
Failed paths that do not achieve the goals (negative instances)
Alternative paths of different desirability (preferred instances)
But humans learn from other sources of information and our AI systems should as well.
Challenge: Learn from Many Sources
There has been relatively little research on plan learning from:
Demonstrations of solved problems (Nejati et al., 2006)
Explicit instruction from teacher (Blythe et al., 2007)
Advice or hints from teacher (Mostow, 1983)
Mental simulations or daydreaming (Mueller, 1985)
Undesirable side effects during execution
Humans learn from all of these sources, and our learning systems should support the same capabilities.
Moreover, we should develop single systems that integrate plan knowledge learned from all of them (Oblinger, 2006).
Traditional Performance Tasks
Most research on learning for planning has assumed the system aims to improve:
The efficiency of plan generation (nodes expanded, time)
The quality of generated plans (path length, utility)
The coverage of plan knowledge (problems solved)
But humans learn and use plan knowledge for other purposes that are just as valid.
Challenge: Learn for Plan Execution
Many important domains require executing plan knowledge in some environment that includes:
operators with likely but nonguaranteed effects
external events not directly under the agents control
other agents that are pursuing their own goals
Urban driving is one setting that raises all three of these issues.
Complex board games like chess, although deterministic, still require interleaving of planning and execution.
We need more research on plan learning in contexts of this sort (e.g., Benson, 1995; Fern et al., 2004).
Challenge: Learn for Plan Understanding
Another understudied problem is learning for plan understanding.
Given: A partially observed sequence of states influenced by another agent’s actions.
Given: Learned knowledge about how to achieve goals.
Find: The other agent’s goals and the plans it is pursuing to achieve them.
Plan understanding is important not only in complex games, but in military planning, politics, and other settings.
This performance task suggests new learning problems, methods, and evaluation criteria.
Traditional Learning Scenarios
Most research on learning for planning has assumed the system:
Trains on problems from a given distribution / domain
Tests on problems from the same distribution / domain
Success depends on the extent to which the learner generalizes well to new problems from the same domain.
But humans also use their learned plan knowledge in other, more flexible ways to improve performance.
Challenge: Cumulative Learning
In complex domains, humans learn plan knowledge gradually:
Starting with small, relatively easy problems
Moving to complex problems after mastering simpler ones
Later acquisitions build naturally on earlier experience, learning to cumulative learning.
Our education system depends heavily on such “vertical transfer” of learned knowledge.
We need more learning systems that demonstrate this form of cumulative improvement (e.g., Reddy & Tadepalli, 1997).
Challenge: Cross-Domain Transfer
In other cases, humans exhibit a form of transfer that involves:
Learning to solve problems in one domain
Reusing this knowledge to solve problems in another domain that is superficially quite different
Such cross-domain transfer is related to within-domain analogical reasoning, but it is far more challenging.
In its extreme form, the two domains support similar solutions but have no shared symbols or predicates.
We need more learning systems that demonstrate this radical form of knowledge reuse.
Traditional Learned Representations
Most research on learning for planning has focused on learning:
Control rules that reduce effective branching factor
Macro-operators that reduce effective solution depth
These grew naturally from representations used to create hand-crafted expert problem solvers.
But now we have other representations of plan knowledge that suggest new learning tasks and methods.
Nor does this refer to POMDPs, workflows, or other highly constrained formalisms.
Challenge: Learn HTNs
the modularity and flexibility of search-control rules
the large-scale structure of macro-operators
Hierarchical task networks (HTNs) offer the most effective planning available, but they are expensive to build manually.
HTNs provide an ideal target for learning because they have:
Machine learning has automated the creation of expert classifiers.
We should do the same for HTNs, which are effectively expert planning systems.
Challenge: Learn HTNs
Given: Basic knowledge about some action-oriented domain
Given: A set of training problems (initial states and goals)
Given: Some performance task the system must carry out.
Given: Some module that uses HTNs to perform this task
Learn: An HTN that lets the system improve its performance on new tasks from the same or similar domain.
We can define the task of learning hierarchical task networks as:
We need more research on this important topic (e.g., Reddy & Tadepalli, 1997; Ilghami et al., 2005).
Some Responses
acquire a constrained but important class of HTNs
that one can use for both planning and reactive control
from both successful problem solving and expert traces
that extends naturally to support cross-domain transfer
Our recent research attempts to respond to these challenges by developing methods that:
Moreover, these ideas are embedded in an integrated architecture that supports many capabilities ICARUS (Langley, 2006).
Primitive Concept
(assigned-mission ?patient ?mission)
Nonprimitive Concept
(patient-form-filled ?patient)
Conceptual Knowledge in ICARUS
Conceptual knowledge is cast as Horn clauses that specify relevant relations in the environment Memory is organized hierarchically Divided into primitive and non-primitive predicates
HTN Methods in ICARUS
Similar to SHOP2 but methods indexed by goals they achieve Each method decomposes a goal into subgoals
If a method’s goal is active and its precondition is satisfied, then try to achieve its subgoals or apply its operators
precondition concept
HTN method
HTN goal concept
HTN methodsubgoal
operator
Operators in ICARUS
Effects Concept(arrival-time ?patient)
Precondition Concept(patient ?p) and
(travel-from ?p ?from) and(travel-to ?p ?to)
Action(get-arrival-time ?patient ?from ?to)
Operators describe low-level actions that agents can execute directly in the environment
Preconditions: legal conditions for action execution Effects: expected changes when action is executed
Training Input: Expert Traces and Goals
Expert demonstration traces Operators the expert uses and the resulting belief state
State: Set of concept instances Goal is a concept instance in the final state
ICARUS learns generalized skills that achieves similar goals
Operator instance(get-arrival-time P2)
Concept instance(assigned-flight P1 M1)
State
Goal concept(all-patients-arranged)
Learning Plan Knowledge from Demonstration
Plan Knowledge
If Impasse
Problem
?InitialState goal
LIGHT
Demonstration Traces
Background knowledge
Reactive Executor
Learnedplan
knowledge
Concept definitions
OperatorsStates and actions
HTNs
Expert
Learning HTNs by Trace Analysis
concepts
actions
Operator Chaining
Learning HTNs by Trace Analysis
Concept Chaining
concepts
actions
Learning HTNs by Trace Analysis
Explanation Structure for
Trace
(dest-airportpatient1 SFO)
(arrival-time NW32 1pm)
(query-arrival-time)
(scheduledNW32)
(location patient1 SFO 1pm)
(assigned patient1 NW32)
(flight-available)
(assign patient1 NW32)
(transfer-hospitalpatient1 hospital2)
(arrange-ground-transportationSFO hospital2 1pm)
(close-airport hospital2 SFO)
Time:1 Time:2
Time:3
Hierarchical Task Network Structure
(dest-airport?patient ?loc)
(arrival-time ?flight ?time)
(query-arrival-time)
(scheduled?flight)
(location ?patient ?loc ?time)
(assigned ?patient ?flight)
(flight-available)
(assign ?patient ?flight)
(transfer-hospital?patient ?hospital)
(arrange-ground-transportation?loc ?hospital ?time)
(close-airport ?hospital ?loc)
concepts
actions
Transfer by Representation Mapping
Predicate mappings
Source domain
Target domain
Challenge: Learn with Richer Goals
HTNs are more expressive than classical plans (Errol et al., 1994).
Our approach loses this advantage because it assumes the head of each method is a goal it achieves, but we can:
This scheme should acquire the full class of HTNs while still retaining the tractability of goal-directed learning.
Extend goal concepts to describe temporal behavior
Revise the execution module to handle these structures
Augment trace analysis to reason about temporal goals
Learn new methods with temporal goals in their heads
Challenge: Extend Conceptual Vocabularies
Given: A set of concepts used in goals, states, and methods
Given: New methods acquired from sample solution traces
Find: New concepts that produce improved performance as the result of future method learning.
Our approach to learning HTNs relies on the concept hierarchy used to explain solution traces.
The method would be less dependent if it extended this hierarchy:
This would support a bootstrapped learner that invents predicates to describe states, goals, and methods.
Challenge: Extend Conceptual Vocabularies
Define a new concept for the precondition of each method learned by chaining off a concept definition.
Check traces for states in which this concept becomes true and learn methods to achieve it.
During performance, treat each method’s precondition as its first subgoal, which it can achieve if submethods are known.
Our approach to utilizing predicate invention has three steps:
This technique would make an HTN more complete by growing it downward, introducing nonterminal symbols as necessary.
We have partially implemented this scheme and hope to report results at the next meeting.
Concluding Remarks: Research Style
Clearly, there remain many open problems to address in learning plan knowledge.
These involve new abilities, not improvements on existing ones, which suggests that we:
These strategies will help us extend the reach of our learning systems, not just strengthen their grasp.
Look at human behavior for ideas on how to proceed
Develop integrated systems rather than component algorithms
Demonstrate their behavior on challenging domains
Concluding Remarks: Evaluation
We must evaluate our new plan learners, but this does not mean:
More appropriate experiments would revolve around:
Measuring their speed in generating plans Showing they run faster than existing systems Entering them in planning competitions
Demonstrating entirely new functionalities Running lesion studies to show new features are required Using performance measures appropriate to the task
These steps will produce conceptual advances and scientific understanding far more than will mindless bake-offs.
Concluding Remarks: Summary
Learning plan knowledge is a key area with many open problems:
These challenges will benefit from earlier work on plan learning, but they also require new ideas.
Together, they should lead us toward learning systems that rival humans in their flexibility and power.
Learning from traces, advice, and other sources
Transferring knowledge within and across domains
Learning and extending rich structures like HTNs
End of Presentation
ICARUS Concepts for In-City Driving
((in-rightmost-lane ?self ?clane) :percepts ((self ?self) (segment ?seg)
(line ?clane segment ?seg)) :relations ((driving-well-in-segment ?self ?seg ?clane)
(last-lane ?clane) (not (lane-to-right ?clane ?anylane))))
((driving-well-in-segment ?self ?seg ?lane) :percepts ((self ?self) (segment ?seg) (line ?lane segment ?seg)) :relations ((in-segment ?self ?seg) (in-lane ?self ?lane)
(aligned-with-lane-in-segment ?self ?seg ?lane)(centered-in-lane ?self ?seg ?lane)(steering-wheel-straight ?self)))
((in-lane ?self ?lane) :percepts ((self ?self segment ?seg) (line ?lane segment ?seg dist ?dist)) :tests ( (> ?dist -10) (<= ?dist 0)))
Representing Short-Term Beliefs/Goals
(current-street me A) (current-segment me g550)(lane-to-right g599 g601) (first-lane g599)(last-lane g599) (last-lane g601)(at-speed-for-u-turn me) (slow-for-right-turn me)(steering-wheel-not-straight me) (centered-in-lane me g550 g599)(in-lane me g599) (in-segment me g550)(on-right-side-in-segment me) (intersection-behind g550 g522)(building-on-left g288) (building-on-left g425)(building-on-left g427) (building-on-left g429)(building-on-left g431) (building-on-left g433)(building-on-right g287) (building-on-right g279)(increasing-direction me) (buildings-on-right g287 g279)
((in-rightmost-lane ?self ?line) :percepts ((self ?self) (line ?line)) :start ((last-lane ?line)) :subgoals ((driving-well-in-segment ?self ?seg ?line)))
((driving-well-in-segment ?self ?seg ?line) :percepts ((segment ?seg) (line ?line) (self ?self)) :start ((steering-wheel-straight ?self)) :subgoals ((in-segment ?self ?seg)
(centered-in-lane ?self ?seg ?line)(aligned-with-lane-in-segment ?self ?seg ?line)(steering-wheel-straight ?self)))
((in-segment ?self ?endsg) :percepts ((self ?self speed ?speed) (intersection ?int cross ?cross)
(segment ?endsg street ?cross angle ?angle)) :start ((in-intersection-for-right-turn ?self ?int)) :actions ((steer 1)))
ICARUS Skills for In-City Driving
IICARUSCARUS Interleaves Execution and Problem Solving Interleaves Execution and Problem Solving
Executed plan
Problem
??
Skill Hierarchy
Primitive Skills
ReactiveExecution
impasse?
ProblemSolving
yesyes
nono
This organization reflects the psychological distinction between automatized This organization reflects the psychological distinction between automatized and controlled behavior. and controlled behavior.
top related