trade off between exploration and exploitation in satisficing planning fan xie
TRANSCRIPT
Outline
What is Satisficing Planning
Heuristic Search in Planning
Why we need exploration?
Analysis of Arvand
Arvand-LTS: Arvand with Local MCTS
Experiments
Outline
What is Satisficing Planning
Heuristic Search in Planning
Why we need exploration?
Analysis of Arvand
Arvand-LTS: Arvand with Local MCTS
Experiments
Satisficing Planning
Deterministic environment
Only require sub-optimal solutions
Domain Independent Planning
Implicit Representation of the search space (why not explicit representation?) Impossible in most cases, because of huge state space
Example: An initial state: s0 A set of actions: A A set of requirements of a goal state: G
Outline
What is Satisficing Planning
Heuristic Search in Planning
Why we need exploration?
Analysis of Arvand
Arvand-LTS: Arvand with Local MCTS
Experiments
Some Background
What is a Heuristic?Here, tell you how close this node to objects
Greedy Best-First Search:When expanding node n, take each successor n' and
place it on one list ordered by h(n’)
Hill Climbing Search:check neighbor nodes of current node, select the
node has lower h-value than current node. (if many, the lowest)
Terminates when no neighbor node has lower h-value
Heuristic Search As Planning
FF PlannerHill climbingFF heuristic: not admissbleEnforced Hill climbing: more exploration in hill
climbing to escape from local mimima
LAMA PlannerGreedy Best-First Search (WA*)Mixed heuristic: FF+Landmark
Outline
What is Satisficing Planning
Heuristic Search in Planning
Why we need exploration?
Analysis of Arvand
Arvand-LTS: Arvand with Local MCTS
Experiments
Why we need exploration?
Best First Search and Hill Climbing, mostly do greedy exploitation.
Problem: Local Minima and Plateaus
Local Minima and Plateaus
Local minima: local best h-value
Plateaus: an area all nodes have the same h-value
More Exploration
Current algorithms or planners directly address the tradeoff between exploration and exploitation:RRT(not for satisficing planning) Identidem (stochastic hill climbing)Diverse best-first search (not published yet)Arvand (Monte-Carlo random walk)
Rapidly-Exploring Random Tree(RRT)
RRT gradually builds a tree in the search space until a path to the goal state is found. At each step the tree is either expanded towards the goal, which corresponds to exploitation, or towards a randomly selected point in the search space for exploration
RRT
RRT requires complete model of the environment to generate random points for exploration.
However, current planning domains mostly provide implicit representation of the search space. Random points might be invalid. (one possible way
to do is assume it is valid)Distribution of random points is not uniformed.
Identidem
Coles and Smith’s Identidem introduces exploration by stochastic local search (SLS).
Algorithm:Local searchaction sequences chosen probabilistically from the
set of all possible actions in each stateevaluates the FF heuristic after each action and
immediately jumps to the first state that improves on the start state
Diverse best-first search (DBFS)
diversify search directions by probabilistically selecting a node that does not have the best heuristic estimate ( not published yet)
DBFS GBFS KBFS
# Solved(16
12)
1451(161)
1209(403)
1288(324)
Arvand
Exploration using random walks helps to overcome the problem of local minima and plateaus.
Jumping greedily exploits the knowledge gained by the random walks.
Diff with Identidem: only the end-states of random walks are evaluated
Outline
What is Satisficing Planning
Heuristic Search in Planning
Why we need exploration?
Analysis of Arvand
Arvand-LTS: Arvand with Local MCTS
Experiments
Analysis of Arvand
Fast Exploration:Exploration using random walksOnly end-states evaluated makes faster exploration
(computing heuristic value takes 90% of time)
Greedy Exploitation: Jump to the best obtained node
Coverage of Arvand(current ipc problems not hard enough)
Arvand LAMA FF Fast Downward
# Solved(17
82)
1641(92%)
1581(89%)
1389(78%)
1374(77%)
Outline
What is Satisficing Planning
Heuristic Search in Planning
Why we need exploration?
Analysis of Arvand
Arvand-LTS: Arvand with Local MCTS
Experiments
Arvand-LTS: Arvand with Local MCTS
Motivation:Use more knowledge we get from random walks?Selectively growing a search tree while running
random walks
MRW-LTS
Every local search build a local search tree
Random walks are required starting from leaf nodes of the search tree.
Nodes in tree store the minimum h-value obtained by random walks starting from their subtrees (not node h-value)
It selects a leaf node by following an ε-greedy strategy in each node.
Outline
What is Satisficing Planning
Heuristic Search in Planning
Why we need exploration?
Analysis of Arvand
Arvand-LTS: Arvand with Local MCTS
Experiments
Coverage on IPC-6Domains LAMA Arvand Arvand-LTS
Cyber100% 100% 100%
Elevator87% 100% 100%
Openstacks100% 100% 100%
Parcprinter77% 100% 100%
Pegsols 100% 100% 100%Scanalyzer
100% 90% 90%Transport
100% 100% 100%Woodworking
100% 100% 100%Total
96% 99% 99%