learning control knowledge for planning
DESCRIPTION
Learning Control Knowledge for Planning. Yi-Cheng Huang. Outline. I.Brief overview of planning II.Planning with Control knowledge III.Learning control knowledge IV.Conclusion. I. Overview of Planning. Planning - a very general framework for many applications: Robot control; - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/1.jpg)
Learning Control Knowledge for Planning
Yi-Cheng Huang
![Page 2: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/2.jpg)
Outline
I. Brief overview of planning
II. Planning with Control knowledge
III. Learning control knowledge
IV. Conclusion
![Page 3: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/3.jpg)
I. Overview of Planning
Planning - a very general framework for many applications:Robot control;Airline scheduling;Hubble space telescope control.
Planning – find a sequence of actions that leads from an initial state to a goal state.
![Page 4: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/4.jpg)
Planning Is Difficult –Abundance of Negative Complexity Results
Domain-independent planning: PSPACE-complete or worse (Chapman 1987; Bylander 1991; Backstrom 1993).
Domain-dependent planning: NP-complete or worse (Chenoweth 1991; Gupta and Nau 1992).
Approximate planning: NP-complete or worse (Selman 1994).
![Page 5: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/5.jpg)
Recent State-of-the-art Planners
Constraint-based Planners – Graphplan, Blackbox.
Heuristic Search Planners – HSP, FF.
Both kinds of planners can solve problems in seconds or minutes that traditional planners take hours or days.
![Page 6: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/6.jpg)
Graphplan (Blum & Furst, 1995)
Facts
... ...
FactsActions
...
Search on planning graph to find plan
Time i Time i+1
![Page 7: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/7.jpg)
Blackbox (Kautz & Selman, 1999)
z)yu)(xstd)(cb)(a(
Satisfiability Tester ( Chaff ,WalkSat, Satz, RelSat, ...)
plan
problem
![Page 8: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/8.jpg)
Heuristic Search Based Planning (Bonet & Geffner, ‘9
7)
Use various heuristic functions to approximate the distance from the current state to the goal state based on the planning graph.
Use Best-First Search or A* search to find plans.
![Page 9: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/9.jpg)
II. Planning With Control
General focus on planning: avoid search as much as possible.
Many real-world applications are tailored and simplified by domain specific knowledge.
TLPlan is an efficient planner when using control knowledge to guild a forward-chaining search planner (Bacchus & Kabanza 2000) .
![Page 10: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/10.jpg)
TLPlan
Temporal Logic Control Formula
![Page 11: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/11.jpg)
A Simple Control Rule Example
Goal
Do NOT move an object at the goal location
(goal (at (obj loc)) at (obj loc))
Temporal logic operator: “always” “next”
![Page 12: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/12.jpg)
Question:
Whether the same level of control can be effectively incorporated into constraint-based planner?
![Page 13: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/13.jpg)
I. Rules involves only static information.
II. Rules depends on the current state.
III. Rules depends on the current state and
require dynamic user-defined predicates.
Control Rules Categories
![Page 14: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/14.jpg)
Category I Control Rules(only depends on goal; toy example)
Do NOT unload an package from an airplane if the current location is not in the package’s goal
Goal
L
a
a
a
p)) in(a l) goal(al) at(pp) (in(a
![Page 15: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/15.jpg)
Pruning the Planning GraphCategory I Rules
Facts FactsActions
... ...
![Page 16: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/16.jpg)
Effect of Graph Pruning
0
2000
4000
6000
8000
10000
log-a log-b log-c log-d
nu
mb
er o
f n
od
es
Original Pruned
![Page 17: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/17.jpg)
Category II Control Rules
L
Do NOT move an airplane if there is an object in the airplane that needs to be unloaded at that location.
a
l))) at(p l) at(pp) (in(a l) goal(a:la,(
![Page 18: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/18.jpg)
Control by Adding Constraints
Temporal Logic Control Rules
Planning Formula Constraints Clauses
)yyx( 1iii
l))) at(p l) at(pp) (in(a l) goal(a:la,(
![Page 19: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/19.jpg)
Rules Without Compact Encoding
NYC
SFO
ORL
Do NOT move a vehicle unless(a) there is an object that needs to be picked up(b) there is an object in the vehicle that needs to be unloaded
Goal
DC
a
a
b
b
![Page 20: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/20.jpg)
Complex Encoding for Category III Rules
Need to define extra predicates:
need_to_move_by_airplane; need_to_unload_by_airplane Introduce extra literals and clauses.
O(mn) ground literals; O(mn+km^2) clauses at each time step.
m: #cities, n: #objects, k: #airports
No easy encoding for category III rules. However, it appears category I & II rules do m
ost of work.
![Page 21: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/21.jpg)
Blackbox with Control Knowledge(Logistics domain with hand-coded rules)
1
10
100
1000
10000
log-a log-b log-c log-d log-e
tim
e (s
ec)
blackbox blackbox(I) blackbox(II) blackbox(I&II)
Note: Logarithmic time scale
![Page 22: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/22.jpg)
Comparison of Blackbox and TLPlan (Run Time)
0
20
40
60
80
log-a log-b log-c log-d log-e
Tim
e (
se
c)
TLPlan Blackbox(I&II)
![Page 23: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/23.jpg)
Comparison of Blackbox and TLPlan(parallel plan length; “plan quality”)
0
5
10
15
20
25
30
35
log-c log-d log-e log-1 log-2
Par
alle
l P
lan
Len
gth
TLPlan TLPlan-R Blackbox
![Page 24: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/24.jpg)
Summary Adding Control Knowledge
We have shown how to add declarative control knowledge to a constraint-based planners by using temporal logic statements.
Adding such knowledge gives significant speedups (up to two orders of magnitude).
Pure heuristic search with control can be still faster but with much lower plan quality.
![Page 25: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/25.jpg)
III. Can we learn domain knowledge from example plans?
![Page 26: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/26.jpg)
Motivation
Control Rules used in TLPlan and Blackbox are hand-coded.
Idea: learn control rules on a sequence of small problems solved by planner.
![Page 27: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/27.jpg)
Learning System Framework
Plan Justification / Type Inference
Blackbox Planner
Problem
ILP Learning Module / Verification
Control Rules
![Page 28: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/28.jpg)
Target Concepts for Actions
Action Select Rule: indicate conditions under which the action can be performed immediately.
Action Reject Rule: indicate conditions under which it must not be performed.
![Page 29: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/29.jpg)
Basic Assumption on Learning Control
Plan found by planner on simple problems are optimal or near-optimal.
Actions appear in an optimal plan must be selected.
Actions that can be executed but do not appear in the plan must be rejected.
![Page 30: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/30.jpg)
Real action: action appears in the plan.
Virtual action: action that its preconditions are hold but does not appear in the plan.
Definition
![Page 31: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/31.jpg)
An Toy Planning Example
GoalInitial
BOS SFONYC
Initial
a ba b
![Page 32: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/32.jpg)
Real & Virtual Actions for UnloadAirplane
Time 1: LoadAirplane (P a BOS)Time 2: FlyAirplane (P SFO NYC) UnloadAirplane (P a BOS)Time 3: LoadAirplane (P b NYC) UnloadAirplane (P a NYC)Time 4: FlyAirplane (P NYC SFO) UnloadAirplane (P a NYC) UnloadAirplane (P b NYC)Time 5: UnloadAirplane (P a SFO) UnloadAirplane (P b SFO)
Real
Virtual
![Page 33: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/33.jpg)
Heuristics for Extracting Examples
Select Rule Reject Rule
+ example - example + example - example
real virtual virtual real
![Page 34: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/34.jpg)
Rule Induction
Literal: Xi = Xj , ex., loc1 = loc2 P(X1,…, Xn), ex., at (pkg, loc) goal (P(X1,…, Xn)), ex., goal (at (pkg, loc)) negation of the above
literalsaction )(
Based on Quinlan’s FOIL (Quinlan 1990; 1996).
![Page 35: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/35.jpg)
Reject Rule: UnloadAirplane
time pln pkg apt
+ 2 P a BOS
+ 3 P a NYC
+ 4 P a NYC
+ 4 P a NYC
- 5 P a SFO
- 5 P a SFO
UnloadAirplane (pln pkg apt)
![Page 36: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/36.jpg)
Reject Rule: UnloadAirplane
UnloadAirplane (pln pkg apt) goal(at (pkg loc))
time pln pkg apt loc
+ 2 P a BOS SFO
+ 3 P a NYC SFO
+ 4 P a NYC SFO
+ 4 P a NYC SFO
- 5 P a SFO SFO
- 5 P a SFO SFO
![Page 37: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/37.jpg)
Reject Rule: UnloadAirplane
UnloadAirplane (pln pkg apt) goal(at (pkg loc)) ^ (apt != loc)
time pln pkg apt loc
+ 2 P a BOS SFO
+ 3 P a NYC SFO
+ 4 P a NYC SFO
+ 4 P a NYC SFO
- 5 P a SFO SFO
- 5 P a SFO SFO
![Page 38: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/38.jpg)
Learning Time
0
10
20
30
40
50
logitics(10)
briefcase(3)
grid (6) gripper (2) mystery(6)
tireworld(5)
Tim
e (s
ec.)
![Page 39: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/39.jpg)
Logistics Domain
1
10
100
1000
10000
100000
Problems
Tim
e (s
ec.)
w/ control w/o control
![Page 40: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/40.jpg)
Learned Logistics Control Rules
If an object’s goal location is at different city, do NOT unload the object from airplanes.
p)) in(o c) incity(lc) incity(ml) goal(om) at(pp) (in(o
c) incity(ac) incity(ll) goal(aairport(a)a) at(tt) (in(o
Unload an object from a truck if the current location is an airport and it is not in the same city as the package’s goal location.
a)) at(o
![Page 41: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/41.jpg)
Briefcase Domain
0.01
0.1
1
10
100
1000
10000
Problems
Tim
e (s
ec.)
w/ control w/o control
![Page 42: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/42.jpg)
Grid Domain
1
10
100
1000
10000
100000
Problems
Tim
e (s
ec.)
w/ control w/o control
![Page 43: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/43.jpg)
Gripper Domain
0.1
1
10
100
1000
10000
100000
Problems
Tim
e (s
ec.)
w/ control w/o control
![Page 44: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/44.jpg)
Mystery Domain
0.1
1
10
100
1000
Problems
Tim
e (s
ec.)
w/ control w/o control
![Page 45: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/45.jpg)
Tireworld Domain
0.1
1
10
100
1000
10000
Problems
Tim
e (s
ec.)
w/ control w/o control
![Page 46: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/46.jpg)
Summary of Learning for Planning
Introduced inductive logic programming methodology into constraint-based planning framework to obtain “trainable planner”.
Demonstrated clear practical speedups on range of benchmark problems.
![Page 47: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/47.jpg)
IV. Single-agent vs. Multi-agent planning
Observations: heuristic planners degrade rapidly in multi-agent settings. They tend to assign all work to a single agent.
We studied this phenomenon by exploring different work-load distributions.
![Page 48: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/48.jpg)
Force the Planners
There is no easy way to modify the heuristic search planners to find better quality plans.
Limit the number of feature actions an agent can perform to force the planners to find plans with the same level of participation of all agents.
![Page 49: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/49.jpg)
Sokoban Domain
![Page 50: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/50.jpg)
Restricted Sokoban Domain
1
100
10000
1000000
sokoban-1 (4,4,4) sokoban-2 (3,3,3) sokoban-3 (5,5,5)
Blackbox HSP FF
![Page 51: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/51.jpg)
Complexity Analysis on Restricted Domain
C.B.P H.P.
Sokoban PSPACE-Complete(Culberson, 1997)
V
Rocket NP-Complete(reduce from vertex feedback)
V
Grid Polynomial Solvable V
Elevator Polynomial Solvable V
![Page 52: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/52.jpg)
Conclusions (a)
Demonstrated how performance of state-of-the-art general purpose planning systems can be boosted by incorporating control knowledge.
Knowledge encoded in purely declarative form using temporal logic formulas.
Obtained up to 2 orders of magnitude speedup on series of benchmarks.
![Page 53: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/53.jpg)
Conclusions (b) Demonstrated feasibility of a “trainable” planning
system: system learns domain / control knowledge from many small example plans.
Based on concepts from inductive logic programming. Learned knowledge in temporal logic form.
First demonstration of practical speedups using learning in a planning system on realistic benchmarks.
Approach avoids learning “accidental truths” that can hurt system performance (problem in earlier systems)
![Page 54: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/54.jpg)
Conclusions (c)
Uncovered link between performance of planners and inherent complexity of planning task.
Heuristic search planners work well on problems solvable in poly time with specialized algorithms.
Constraint-based planner dominate on NP-complete planning tasks.
![Page 55: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/55.jpg)
Conclusion
Comparison of constraint-based planner and heuristic search planner shows that they complement each other on different domains.
Hand-coded control knowledge can be effectively applied in constraint-based planners.
![Page 56: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/56.jpg)
Conclusion (cont.)
Our learning system is simple and modular; learning time is short.
Learned rules are on par with hand-coded ones and shown to improve the performance for over two orders of magnitude.
Learned rules are in logic form and can be used on other planning systems.
![Page 57: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/57.jpg)
Demonstrated a way for effectively learning domain knowledge from small general plans. Learned control knowledge boosts performance on larger problems. First clear demonstration of boosting plan system performance through learning.
Declarative, logic-based approach is general and fits wide range of planning applications.
![Page 58: Learning Control Knowledge for Planning](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56814ba1550346895db87d7e/html5/thumbnails/58.jpg)
The End