irrigation network is a network of irrigation channels connected by regulation devices

1
Continuous formulation of the irrigation network problem cannot be solved exactly by any MDP solver Evaluation of solution quality (mean and standard deviation) and running time (in seconds): Irrigation network is a network of irrigation channels connected by regulation devices Transition functions represent water flows between channels given actions at regulation devices Objective is the operation of valves to maintain optimal water levels Reward function characterizes preferred water levels Solving Factored MDPs with Continuous and Discrete Variables Introduction Approximate LP for HMDPs Factored -HALP Algorithm Experimental Results Linear Value Function Approximation Value function represented as a linear combination of k basis functions: Basis functions f i (x) depend on continuous and discrete variables. Optimization is performed over weights w k 1 i i i f w ) V( x x HALP Formulation Hybrid approximate LP (HALP) formulation: where i is state relevance weight F i (x, a) is a difference between basis function f i (x) and its discounted backprojection A a X x a x a x w , , 0 , R , F w : to subject w minimize i i i i i i D C D C x x C i i i x x C i i d f , p f , F d f x x a x x x a x x x x Quality of HALP Approximation Proposition 1 Let w be an optimal solution of the HALP. Then, for any Lyapunov function L(x): Analogous to de Farias and Van Roy 2001 result for approximate LP for discrete MDPs L 1 , T , 1 H V min 1 L 2 H V w w w Choice of Representation Continuous basis functions defined as polynomials Basis function decomposition along continuous and discrete factors Closed-form representation of the objective function Mixture of betas transition model for continuous factors Decomposition of the constraints along continuous and discrete functions and closed- form representation i j i , j x m j i i x f x x C i C C C D i D D D C x i i i i x i i i x x C i d f , p f , p d f , p x x a x x x a x x x x a x x Hybrid Markov Decision Processes Many real-world stochastic planning problems have continuous and discrete variables, naturally formulated as hybrid MDPs (HMDPs) There are few methods for solving Hybrid MDPs Hybrid MDPs are Complex to Solve Traditional solution techniques are affected by the curse of dimensionality Discrete-state MDPs State and action spaces grow exponentially with the number of variables Continuous-state MDPs State and action spaces are infinitely large Often, no closed-form representation for the value function exists Naïve discretization often leads to exponential complexity Irrigation Network Example Experimental Results M ean Std Time Method M ean Std 1 42.8 3.0 2 Random 35.9 2.7 1 / 2 60.3 3.0 21 Local 55.4 2.5 1 / 4 61.9 2.9 184 Global1 60.4 3.0 1 / 8 72.2 3.5 1068 Global4 66.0 3.6 1 / 16 73.8 3.0 13219 Global16 68.2 3.2 -H ALP A lternative solutions M ean Time M ean Time M ean Time M ean Time M ean Time 1 28.4 1 37.5 1 46.9 1 55.6 2 64.5 3 1 / 2 33.5 3 43.0 5 52.6 9 62.9 17 72.1 28 1 / 4 35.1 11 45.2 21 54.2 43 64.2 63 74.5 85 1 / 8 40.1 46 51.4 85 62.2 118 73.2 168 84.9 193 1 / 16 40.4 331 51.8 519 63.7 709 75.5 963 86.8 1285 M ean Time M ean Time M ean Time M ean Time M ean Time 1 14.8 1 16.2 2 17.5 4 18.5 5 19.7 6 1 / 2 38.6 12 50.5 25 44.0 103 75.8 69 87.6 107 1 / 4 40.1 82 53.6 184 66.7 345 79.0 590 93.1 861 1 / 8 48.0 581 62.4 1250 76.1 2367 90.5 3977 104.5 6377 1 / 16 47.1 4736 62.3 11369 77.6 22699 92.4 35281 107.8 53600 n-ring-of-rings n = 6 n = 9 n = 12 n = 15 n = 18 n = 15 n = 18 n-ring n = 6 n = 9 n = 12 The quality of the -HALP solution beats alternative approx. opt. techniques on the large irrigation network example Solution quality improves with higher grid resolution Time complexity grows polynomially with higher grid resolution 1/ Time complexity grows polynomially with network topology size n HALP provides effective formulation for solving hybrid MDPs Including bounds on the quality of the solution Factored hybrid MDPs allow for closed-form representation of HALP constraints Number of constraints remains infinite Exploit factorization for efficient discretization, -HALP Provide bounds on the effect of discretization Lipschitz constant grows linearly in number of variables Using factored LP decomposition to solve -HALP For fixed tree-width, running time is polynomial in number of variables and in Conclusions Large irrigation network n-ring-of-rings topology Outflow regulatio n device Inflow regulatio n device n-ring topology Irrigation channel represented by a continuous variable Regulation device represented by a discrete action node Optimal Policy and Value Function Value function of an optimal policy satisfies the Bellman-Hamilton-Jacobi fixed point equation: D C x x C d V , p R sup V x x a x x a x, x a Value function V (x) difficult to compute and represent Closed-form solution of the value function may not exist due to the recursive integral definition Approximate solutions Factored Hybrid MDPs Multiagent factored hybrid MDP (HMDP) is a 4-tuple (X, A, P, R): X is a vector of state variables (discrete or continuous) A is a vector of action variables (discrete or continuous) Continuous variables restricted to [0,1] P is a transition model represented by DBN R is a reward function is sum of local rewards C C D D i i i i i i f f f x x x Carlos Guestrin Intel Research, Berkeley Milos Hauskrecht Department of Computer Science Branislav Kveton Intelligent Systems Program X 1 A 1 X 2 A 2 X’ 1 X’ 2 R 1 X 3 A 3 X’ 3 R 2 Representing Conditional Probabilities Use parametric representation Discrete child with discrete parents: Use tabular, decision trees, noisy-or, etc. Discrete child with continuous and discrete parents: Use discriminant functions, d j (Par(X i ’))≥0: Continuous child with continuous and discrete parents: Mixture of Beta distributions: p(X i ’|Par(X i ’)) = Σ Beta(X i ’| h i 1 (Par(X i ’)), h i 2 (Par(X i ’))) h i 1 (Par(X i ’))>0 and h i 2 (Par(X i ’))>0 define moments u i u i j i i )) ' X ( Par ( d )) ' X ( Par ( d )) ' X ( Par '| X ( P Representational & Computational Challenges Constraints require representation of backprojections, functions of continuous and discrete variables HALP requires solution of (linear) convex problem with infinite number of constraints Summary of Factored -HALP Algorithm Factored -HALP formulation HALP formulation contains infinite number of constraints, one for each state x and action a Discretization of continuous state and action variables to (1 / 2 + 1) equally spaced values Total number points per factor exponential only in the dimension of factor Number of constraints is finite, although exponential in the number of variables Efficient Solution for Factored -HALP 1. Discretize continuous state and action variables 2. Identify subsets of variables X i and A i (X j and A j ) that the functions F i (x, a) (R j (x, a)) depend on 3. Compute F i (x i , a i ) and (R j (x j , a j )) for all possible configurations of X i and A i (X j and A j ) 4. Calculate state relevance weights i 5. Use ALP algorithm for factored discrete- valued variables to find the vector of optimal weights w (Guestrin et al. ’01) Near Feasibility Implies Near Optimality Solution of -HALP likely violates constraints in the HALP Proposition 2 Let w be an optimal solution of the HALP and w be an optimal solution of the - HALP, such that solution w is -infeasible. Then: 1 2 H V ˆ H V , 1 , 1 w w w ˆ w ˆ A a X x a x, a x, , R F i i i Quality of -HALP Approximation Theorem 1 Let w be an optimal solution of the -HALP satisfying the -infeasibility condition. Then, for any Lyapunov function L(x): L 1 , T , 1 H V min 1 L 2 1 2 ˆ H V w w w w ˆ Achieving -Infeasibility Appropriate choice of -grid to achieve - infeasibility Lipschitz modulus of the discretized functions G G i G G i i i i i R F R F a , x a , x a x, a x, (x G , a G ) is the closest -grid point to the state- action pair (x, a) Discretize continuous variables using a regular spaced-grid Formulate a linear program with constraints restricted only to grid points Solve the LP using an ALP algorithm for factored discrete MDPs max MK Number of factors Worst-case Lipschitz constant over functions w i F i (x, a) and R j (x, a)

Upload: astrid

Post on 16-Jan-2016

25 views

Category:

Documents


2 download

DESCRIPTION

Hybrid Markov Decision Processes. Quality of -HALP Approximation. Achieving d -Infeasibility. Factored - HALP formulation. Linear Value Function Approximation. Efficient Solution for Factored - HALP. Near Feasibility Implies Near Optimality. Irrigation Network Example. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Irrigation network is a network of irrigation channels connected by regulation devices

Continuous formulation of the irrigation network problem cannot be solved exactly by any MDP solver

Evaluation of solution quality (mean and standard deviation) and running time (in seconds):

Irrigation network is a network of irrigation channels connected by regulation devices

Transition functions represent water flows between channels given actions at regulation devices

Objective is the operation of valves to maintain optimal water levels

Reward function characterizes preferred water levels

Solving Factored MDPs with Continuous and Discrete Variables

Introduction Approximate LP for HMDPs Factored -HALP Algorithm Experimental ResultsLinear Value Function Approximation

Value function represented as a linear combination of k basis functions:

Basis functions fi(x) depend on continuous and discrete variables. Optimization is performed over weights w

k

1iiifw)V( xx

HALP Formulation

Hybrid approximate LP (HALP) formulation:

where i is state relevance weight Fi(x, a) is a difference between basis function fi(x)

and its discounted backprojection

AaXxaxax

w

, ,0,R,Fw:tosubject

wminimize

iii

iii

D C

D C

x xCiii

x xCii

df,pf,F

df

xxaxxxax

xxx

Quality of HALP Approximation

Proposition 1 Let w be an optimal solution of the HALP. Then, for any Lyapunov function L(x):

Analogous to de Farias and Van Roy 2001 result for approximate LP for discrete MDPs

L1,

T

,1HVmin

1L2

HV

ww

w

Choice of Representation

Continuous basis functions defined as polynomials

Basis function decomposition along continuous and discrete factors

Closed-form representation of the objective function

Mixture of betas transition model for continuous factors

Decomposition of the constraints along continuous and discrete functions and closed-form representation

ij

i,j

x

mjii xf

x

x

Ci

CCC

Di

DD

D C

xiiii

xiii

x xCi

df,pf,p

df,p

xxaxxxaxx

xxaxx

Hybrid Markov Decision Processes

Many real-world stochastic planning problems have continuous and discrete variables, naturally formulated as hybrid MDPs (HMDPs)

There are few methods for solving Hybrid MDPs

Hybrid MDPs are Complex to Solve

Traditional solution techniques are affected by the curse of dimensionality

Discrete-state MDPs State and action spaces grow exponentially with

the number of variables Continuous-state MDPs

State and action spaces are infinitely large Often, no closed-form representation for the value

function exists Naïve discretization often leads to exponential

complexity

Irrigation Network Example

Experimental Results

Mean Std Time Method Mean Std1 42.8 3.0 2 Random 35.9 2.7

1 / 2 60.3 3.0 21 Local 55.4 2.51 / 4 61.9 2.9 184 Global 1 60.4 3.01 / 8 72.2 3.5 1068 Global 4 66.0 3.61 / 16 73.8 3.0 13219 Global 16 68.2 3.2

-HALP Alternative solutions

Mean Time Mean Time Mean Time Mean Time Mean Time1 28.4 1 37.5 1 46.9 1 55.6 2 64.5 3

1 / 2 33.5 3 43.0 5 52.6 9 62.9 17 72.1 281 / 4 35.1 11 45.2 21 54.2 43 64.2 63 74.5 851 / 8 40.1 46 51.4 85 62.2 118 73.2 168 84.9 1931 / 16 40.4 331 51.8 519 63.7 709 75.5 963 86.8 1285

Mean Time Mean Time Mean Time Mean Time Mean Time1 14.8 1 16.2 2 17.5 4 18.5 5 19.7 6

1 / 2 38.6 12 50.5 25 44.0 103 75.8 69 87.6 1071 / 4 40.1 82 53.6 184 66.7 345 79.0 590 93.1 8611 / 8 48.0 581 62.4 1250 76.1 2367 90.5 3977 104.5 63771 / 16 47.1 4736 62.3 11369 77.6 22699 92.4 35281 107.8 53600

n-ring-of-rings

n = 6 n = 9 n = 12 n = 15 n = 18

n = 15 n = 18n-ring

n = 6 n = 9 n = 12

The quality of the -HALP solution beats alternative approx. opt. techniques on the large irrigation network example

Solution quality improves with higher grid resolution

Time complexity grows polynomially

with higher grid resolution 1/

Time complexity grows polynomially with network

topology size n

HALP provides effective formulation for solving hybrid MDPs Including bounds on the quality of the solution

Factored hybrid MDPs allow for closed-form representation of HALP constraints Number of constraints remains infinite

Exploit factorization for efficient discretization, -HALP Provide bounds on the effect of discretization Lipschitz constant grows linearly in number of

variables

Using factored LP decomposition to solve -HALP For fixed tree-width, running time is polynomial in

number of variables and in discretization level 1/

Conclusions

Large irrigation network

n-ring-of-rings topology

Outflow regulation device

Inflow regulation device

n-ring topology

Irrigation channel represented by a

continuous variable

Regulation device represented by a discrete

action node

Optimal Policy and Value Function

Value function of an optimal policy satisfies the Bellman-Hamilton-Jacobi fixed point equation:

D Cx x

CdV,pRsupV xxaxxax,xa

Value function V(x) difficult to compute and represent

Closed-form solution of the value function may not exist due to the recursive integral definition

Approximatesolutions

Factored Hybrid MDPs

Multiagent factored hybrid MDP (HMDP) is a 4-tuple (X, A, P, R): X is a vector of

state variables (discrete or continuous)

A is a vector of action variables (discrete or continuous)

Continuous variables restricted to [0,1]

P is a transition model represented by DBN

R is a reward function is sum of local rewards

CCDD iiiiii fff xxx

Carlos GuestrinIntel Research, Berkeley

Milos HauskrechtDepartment of Computer Science

Branislav KvetonIntelligent Systems Program

X1X1

A1A1

X2X2

A2A2

X’1X’1

X’2X’2

R1R1

X3X3

A3A3

X’3X’3

R2R2

Representing Conditional Probabilities

Use parametric representation Discrete child with discrete parents:

Use tabular, decision trees, noisy-or, etc. Discrete child with continuous and discrete parents:

Use discriminant functions, dj(Par(Xi’))≥0:

Continuous child with continuous and discrete parents: Mixture of Beta distributions:

p(Xi’|Par(Xi’)) = Σ Beta(Xi’| hi1(Par(Xi’)), hi

2(Par(Xi’)))hi

1(Par(Xi’))>0 and hi2(Par(Xi’))>0 define

moments

uiu

ijii ))'X(Par(d

))'X(Par(d))'X(Par'|X(P

Representational & Computational Challenges

Constraints require representation of backprojections, functions of continuous and discrete variables

HALP requires solution of (linear) convex problem with infinite number of constraints

Summary of Factored -HALP Algorithm

Factored -HALP formulation

HALP formulation contains infinite number of constraints, one for each state x and action a

Discretization of continuous state and action variables to (1 / 2 + 1) equally spaced values

Total number points per factor exponential only in the dimension of factor

Number of constraints is finite, although exponential in the number of variablesEfficient Solution for Factored -

HALP

1. Discretize continuous state and action variables2. Identify subsets of variables Xi and Ai (Xj and Aj)

that the functions Fi(x, a) (Rj(x, a)) depend on3. Compute Fi(xi, ai) and (Rj(xj, aj)) for all possible

configurations of Xi and Ai (Xj and Aj) 4. Calculate state relevance weights i

5. Use ALP algorithm for factored discrete-valued variables to find the vector of optimal weights w (Guestrin et al. ’01)Near Feasibility Implies Near Optimality

Solution of -HALP likely violates constraints in the HALP

Proposition 2 Let w be an optimal solution of the HALP and w be an optimal solution of the -HALP, such that solution w is -infeasible. Then:

12HV

ˆHV

,1

,1

w

w

w w

AaXx

ax,ax,

,

RFwi

ii

Quality of -HALP Approximation

Theorem 1 Let w be an optimal solution of the -HALP satisfying the -infeasibility condition. Then, for any Lyapunov function L(x):

L1,

T

,1HVmin

1L2

12ˆHV

www

w

Achieving -Infeasibility

Appropriate choice of -grid to achieve -infeasibility

Lipschitz modulus of the discretized functions

GGi

GGiii

ii RFwRFw a,xa,xax,ax,

(xG, aG) is the closest -grid point to the state-action pair (x, a)

Discretize continuous variables using a regular spaced-grid

Formulate a linear program with constraints restricted only to grid points

Solve the LP using an ALP algorithm for factored discrete MDPs

maxMK

Number of factors

Worst-case Lipschitz constant over

functions wiFi(x, a) and Rj(x, a)