cse 574: planning & learning subbarao kambhampati 1/17: state space and plan-space planning...

37
CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan- space Planning Office hours: 4:30—5:30pm T/Th

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

1/17: State Space and Plan-space Planning

Office hours: 4:30—5:30pm T/Th

Page 2: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Do you know..

Factored vs. explicit state models Plan vs. Policy STRIPS assumption Conditional effects

– Why is the conditional effect P=>Q allowed but the disjunction PVQ not allowed in deterministic planning?

– And connection to executability Multi-valued fluents Durative vs. non-durative actions Partial vs. complete state Useful anlogies

– “preconditions” are like “goals”– “effects” are like “init state literals”

Page 3: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Some notes on action representation

STRIPS Assumption: Actions must specify all the state variables whose values they change...

No disjunction allowed in effects – Conditional effects are NOT disjunctive

» (antecedent refers to the previous state & consequent refers to the next state)

Quantification is over finite universes

– essentially syntactic sugaring All actions can be compiled down to a canonical

representation where preconditions and effects are

propositional – Exponential blow-up may occur (e.g removing conditional

effects) » We will assume the canonical representation

Action A1 Prec: P, Q Eff: R, W

Action A2 Prec: P, ~Q Eff: R, ~W

Action A3 Prec: ~P, Q Eff: ~R, W

Action A4 Prec: ~P,~Q Eff:

Action A

Eff: If P then R If Q then W

Review

Page 4: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Pros & Cons of Compiling to Canonical Action Representation (Added)

As mentioned, it is possible to compile down ADL actions into STRIPS actions– Quantification is written as conjunctions/disjunctions over finite universes– Actions with conditional effects are compiled into multiple (exponentially

more) actions without conditional effects– Actions with disjunctive effects are compiled into multiple actions, each of

which take one of the disjuncts as their preconditions– (Domain axioms can be compiled down into the individual effects of the

actions; so all actions satisfy STRIPS assumption) Compilation is not always a win-win.

– By compiling down to canonical form, we can concentrate on highly efficient planning for canonical actions» However, often compilation leads to an exponential blowup and makes

it harder to exploit the structure of the domain– By leaving actions in non-canonical form, we can often do more compact

encoding of the domains as well as more efficient search» However, we will have to continually extend planning algorithms to

handle these representationsThe basic tradeoff here is akin to the RISC vs. SISC tradeoff..

And we will re-visit it again when we consider compiling planning problems themselves down into other combinatorial substrates such as CSP, ILP, SAT etc..

Review

Page 5: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Boolean vs. Multi-valued fluents

The state variables (“fluents”) in the “factored” representations can be either boolean or multi-valued– Most planners have conventionally used boolean fluents

Many domains are sometimes more compactly and naturally represented in terms of multi-valued variables.

Given a multi-valued state-variable representation, it is easy to compile it down to a boolean state-variable representation.– Each D-domain multi-valued fluent gets translated to D boolean variables of

the form “fluent-has-the-value-v”– Complete conversion should also put in a domain axiom to the effect that

only one of those D boolean variables can be true in any state » Unfortunately, since ordinary STRIPS representation doesn’t allow

domain axioms, this piece of information is omitted during conversion (forcing planners to figure this out through costly search failures)

Conversion from boolean to multi-valued representation is trickier. – Need to find “cliques” of boolean variables where no more than one

variable in the clique can be true at the same time; and convert that clique into a multi-valued state variable.

Page 6: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 7: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Blocks world

State variables: Ontable(x) On(x,y) Clear(x) hand-empty holding(x)

Stack(x,y) Prec: holding(x), clear(y) eff: on(x,y), ~cl(y), ~holding(x), hand-empty

Unstack(x,y) Prec: on(x,y),hand-empty,cl(x) eff: holding(x),~clear(x),clear(y),~hand-empty

Pickup(x) Prec: hand-empty,clear(x),ontable(x) eff: holding(x),~ontable(x),~hand-empty,~Clear(x)

Putdown(x) Prec: holding(x) eff: Ontable(x), hand-empty,clear(x),~holding(x)

Initial state: Complete specification of T/F values to state variables

--By convention, variables with F values are omitted

Goal state: A partial specification of the desired state variable/value combinations

Init: Ontable(A),Ontable(B), Clear(A), Clear(B), hand-empty

Goal: ~clear(B), hand-empty

Page 8: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

PDDL—a standard for representing actions

Page 9: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

PDDL Domains

Page 10: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Problems

Page 11: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Gripper World

Page 12: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Gripper Actions

Page 13: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

How do we do planning?

Obvious idea– Think of planning as search in the space of states of the transition

graph (which is the same as search graph for deterministic case)» Go “forward” in the graph (progression)» Go “backward” in the graph (regression)

More general idea– Think of planning as a search in the space of “partial plans”

» Progression corresponds to searching in the space of “prefix” plans

» Regression corresponds to searching in the space “suffix” plans

» We can also search in the space of “precedence-constrained” plans.. (Plan-space refinement)

“Refinement planning” is my idea of trying to think of all of this from one unified perspective

Page 14: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Progression:

An action A can be applied to state S iff the preconditions are satisfied in the current stateThe resulting state S’ is computed as follows: --every variable that occurs in the actions effects gets the value that the action said it should have --every other variable gets the value it had in the state S where the action is applied

Ontable(A)

Ontable(B),

Clear(A)

Clear(B)

hand-empty

holding(A)

~Clear(A)

~Ontable(A)

Ontable(B),

Clear(B)

~handempty

Pickup(A)

Pickup(B)

holding(B)

~Clear(B)

~Ontable(B)

Ontable(A),

Clear(A)

~handempty

Page 15: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Regression:

A state S can be regressed over an action A (or A is applied in the backward direction to S)Iff: --There is no variable v such that v is given different values by the effects of A and the state S --There is at least one variable v’ such that v’ is given the same value by the effects of A as well as state SThe resulting state S’ is computed as follows: -- every variable that occurs in S, and does not occur in the effects of A will be copied over to S’ with its value as in S -- every variable that occurs in the precondition list of A will be copied over to S’ with the value it has in in the precondition list

~clear(B) hand-empty

Putdown(A)

Stack(A,B)

~clear(B) holding(A)

holding(A) clear(B) Putdown(B)??

Page 16: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 17: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Means-ends Analysis Planning(think backward; move forwardis how original STRIPS worked)

Reduce the difference between the current state and the goal state recursively one difference at a time

Let “D” be a dummy action whose only effect is “done” and preconds are top level goals of the problem

Initialize goal stack GS with “done” Initialize I to the initial state Call STRIPS(I,GS)

STRIPS(I,GS)– If GS is empty Success!– gafirst(GS)– If ga is an action,

» If ga is applicable in I I result of doing e in IElse backtrack

– If ga is a goal and is in I» STRIPS(I,rest(GS))

– Else (ga not in I)» Pick an action a which has an

effect g. {Choice—all such actions need to be considered}

» Push a to the top of rest(GS)» Push precond of a to the top of

rest(GS) {Choice—all permutations of goals need to be considered}

» Call STRIPS(I,GS)

Shakey

http://www.ai.sri.com/movies/Shakey.ram

Page 18: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

STRIPS and “nonlinearity”

STRIPS is incomplete– If the plans for goals have to be interleaved, then

STRIPS will never solve the solution

– Famous Example: Sussman Anomaly What is the class of problems for which STRIPS

is provably complete?– If subgoals are “serializable”—i.e. if there is a way

of solving subgoals one after the other while concatenating their plans

– Easy way to check if subgoals are serializable?

» See if STRIPS solves the problem Why this problem?

– STRIPS cannot separate planning (thinking) order from execution (doing) order

AB

C

C

A B

The anomaly disappears if you describe the goal state completely (include on(C,Table))

Page 19: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Checking correctness of a plan:The State-based approaches

Progression Proof: Progress the initial state over the action sequence, and see if the goals are present in the result

At(A,E)At(R,E)At(B,E)

Load(A)

progress

Load(B)At(B,E)At(R,E) In(A)

In(A)At(R,E) In(B)

progress

Regression Proof: Regress the goal state over the action sequence, and see if the initial state subsumes the result

regressAt(A,E)At(R,E)At(B,E)

Load(A) Load(B)At(B,E)At(R,E) In(A)

In(A) In(B)

regress

Page 20: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Checking correctness of a plan:The Causal Approach

Causal Proof: Check if each of the goals and preconditions of the action are » “established” : There is a preceding step that gives it» “unclobbered”: No possibly intervening step deletes it

Or for every preceding step that deletes it, there exists another step that precedes the conditions and follows the deleter adds it back .

Causal proof is – “local” (checks correctness one condition at a time)– “state-less” (does not need to know the states preceding actions)

» Easy to extend to durative actions– “incremental” with respect to action insertion

» Great for replanning

Contd..

Load(B)Load(A)

In(A)

In(B)At(B,E)

At(R,E)

At(A,E)

At(R,E)

At(A,E)

At(B,E)

At(R,E)

In(A)

~At(A,E)

In(B)~At(B,E)

Page 21: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 22: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Plan Space Planning: Terminology

Step: a step in the partial plan—which is bound to a specific action

Orderings: s1<s2 s1 must precede s2 Open Conditions: preconditions of the steps (including goal

step) Causal Link (s1—p—s2): a commitment that the condition p,

needed at s2 will be made true by s1– Requires s1 to “cause” p

» Either have an effect p» Or have a conditional effect p which is FORCED to happen

By adding a secondary precondition to S1 Unsafe Link: (s1—p—s2; s3) if s3 can come between s1 and s2

and undo p (has an effect that deletes p). Empty Plan: { S:{I,G}; O:{I<G}, OC:{g1@G;g2@G..}, CL:{}; US:

{}}

Page 23: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Partial plan representation

P = (A,O,L,OC,UL)A: set of action steps in the plan S0 ,S1 ,S2 …,Sinf

O: set of action ordering Si < Sj ,…

L: set of causal links OC: set of open conditions (subgoals remain to be satisfied)UL: set of unsafe links where p is deleted by some action Sk

pSi Sj

pSi Sj

S0

S1

S2

S3

Sinf

p

~p

g1

g2

g2oc1

oc2

G={g1 ,g2 }I={q1 ,q2 }

q1

Flaw: Open condition OR unsafe linkSolution plan: A partial plan with no remaining flaw • Every open condition must be satisfied by some action• No unsafe links should exist (i.e. the plan is consistent)

POP background

Page 24: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Algorithm

1. Let P be an initial plan2. Flaw Selection: Choose a flaw f (either

open condition or unsafe link)3. Flaw resolution:• If f is an open condition, choose an action S that achieves f• If f is an unsafe link, choose promotion or demotion• Update P• Return NULL if no resolution exist4. If there is no flaw left, return P else go to 2.

S0

S1

S2

S3

Sinf

p

~p

g1

g2g2oc1

oc2

q1

Choice points• Flaw selection (open condition? unsafe link?)• Flaw resolution (how to select (rank) partial plan?)

• establishment (Action selection) (backtrack point)• Unsafe link resolution (backtrack point)

S0

Sinf

g1

g2

1. Initial plan:

2. Plan refinement (flaw selection and resolution):

POP background

Page 25: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Example ProblemGoals: p,q Actions: A1 takes m and gives p and ~n A2 takes n and gives qInit: m,n

Page 26: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 27: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 28: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 29: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Handling Conditional Effects

Conditional effects don’t change the progression much at all– Why? (because the state in which the operator is being

applied is known. So you know whether or not the conditional effect actually happens)

Handling conditional effects in regression planning introduces “secondary” preconditions– Consider regressing goals {P,Q} over an action A with two

conditional effects: R=>P; J=>~Q– What happens if A has two more effects: U=> P; N=>~Q

Page 30: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 31: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 32: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Page 33: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Handling “lifted” actions(action schemas)

Progression doesn’t change much!– You can generate all the applicable groundings of the

operator Regression changes—can be less committed!

– Consider regressing a goal state {P(a),Q(b)} over an action schema A with effects P(x) and ~Q(y)

– What happens if the effects were U(x)=>P(x) and M(y)=>~Q(y)

Page 34: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Spare Tire Example

Page 35: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Spare Tire Example

Page 36: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Plan-space Planning

Page 37: CSE 574: Planning & Learning Subbarao Kambhampati 1/17: State Space and Plan-space Planning Office hours: 4:30—5:30pm T/Th

CSE 574: Planning & Learning Subbarao Kambhampati

Plan-space planning: Example