chapter 13

Chapter 13

Constraint OptimizationAnd counting, and

enumeration275 class

Outline Introduction

Optimization tasks for graphical models Solving optimization problems with inference and

search Inference

Bucket elimination, dynamic programming Mini-bucket elimination

Search Branch and bound and best-first Lower-bounding heuristics AND/OR search spaces

Hybrids of search and inference Cutset decomposition Super-bucket scheme

A Bred greenred yellowgreen redgreen yellowyellow greenyellow red

Example: map coloring Variables - countries (A,B,C,etc.)

Values - colors (e.g., red, green, yellow)

Constraints: etc. ,ED D, AB,A

C

A

B

DE

F

G

Task: consistency?Find a solution, all solutions, counting

Constraint Satisfaction

= {(¬C), (A v B v C), (¬A v B v E), (¬B v C v D)}.

Propositional Satisfiability

Constraint Optimization Problemsfor Graphical Models

functionscost - },...,{

domains - },...,{

variables- },...,{

:where,, triplea is A

1

1

1

m

n

n

ffF

DDD

XXX

FDXRCOPfinite

A B D Cost1 2 3 31 3 2 22 1 3 02 3 1 03 1 2 53 2 1 0

m

i i XfXF1

)(

FunctionCost Global

f(A,B,D) has scope {A,B,D}

Constraint Optimization Problemsfor Graphical Models

functionscost - },...,{

domains - },...,{

variables- },...,{

:where,, triplea is A

1

1

1

m

n

n

ffF

DDD

XXX

FDXRCOPfinite

A B D Cost1 2 3 31 3 2 22 1 3 02 3 1 03 1 2 53 2 1 0

G

A

B C

D F

m

i i XfXF1

)(

FunctionCost Global

Primal graph =Variables --> nodesFunctions, Constraints - arcs

f1(A,B,D)f2(D,F,G)f3(B,C,F)

f(A,B,D) has scope {A,B,D}

F(a,b,c,d,f,g)= f1(a,b,d)+f2(d,f,g)+f3(b,c,f)

Constrained Optimization

Example: power plant scheduling

)X,...,ost(XTotalFuelC minimize :

)(Power : demandpower time,down-min and up-min ,, :sConstraint

. domain ,Variables

N1

4321

1

Objective

DemandXXXXX

{ON,OFF}},...,X{X

i

n

Probabilistic Networks

Smoking

BronchitisCancer

X-Ray

Dyspnoea

P(S)

P(B|S)

P(D|C,B)

P(C|S)

P(X|C,S)

P(S,C,B,X,D) = P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B)

C BD=0

D=1

0 0 0.1 0.9

0 1 0.7 0.3

1 0 0.8 0.2

1 1 0.9 0.1

P(D|C,B)


Optimization tasks for graphical models Solving by inference and search

Inference Bucket elimination, dynamic programming,

tree-clustering, bucket-elimination Mini-bucket elimination, belief propagation



b

maxElimination operator

MPE

bucket B:

P(a)

P(c|a)

P(b|a) P(d|b,a) P(e|b,c)

bucket C:

bucket D:

bucket E:

bucket A:

e=0

B

C

D

E

A

e)(a,hD

(a)hE

e)c,d,(a,hB

e)d,(a,hC

Finding )xP(maxMPEx

),|(),|()|()|()(max

,,,,cbePbadPabPacPaPMPE

bcdea

Algorithm elim-mpe (Dechter 1996)Non-serial Dynamic Programming (Bertele and Briochi, 1973)

b

maxElimination operator

MPE

exp(W*=4)”induced width” (max clique size)

bucket B:

P(a)

P(c|a)

P(b|a) P(d|b,a) P(e|b,c)

bucket C:

bucket D:

bucket E:

bucket A:

e=0

B

C

D

E

A

e)(a,hD

(a)hE

e)c,d,(a,hB

e)d,(a,hC

Complexity

),|(),|()|()|()(max

,,,,cbePbadPabPacPaPMPE

bcdea

Algorithm elim-mpe (Dechter 1996)Non-serial Dynamic Programming (Bertele and Briochi, 1973)

Complexity of bucket elimination

))((exp ( * dwrOddw ordering alonggraph primal theof width induced the)(*

The effect of the ordering:

4)( 1* dw 2)( 2

* dwconstraint graph

A

D E

CB

B

C

D

E

A

E

D

C

B

A

Finding smallest induced-width is hard

r = number of functions

Bucket-elimination is time and space

Directional i-consistency

DCBR

A

E

CD

B

D

CB

E

D

CB

E

DC

B

E

:A

B A:B

BC :C

AD C,D :D

BE C,E D,E :E

Adaptive d-arcd-path

DBDC RR ,CBR

DRCRDR

Mini-bucket approximation: MPE task

Split a bucket into mini-buckets =>bound complexity

XX gh )()()O(e :decrease complexity lExponentia n rnr eOeO

MBE-MPE(i) Algorithm Approx-MPE (Dechter&Rish 1997)

Input: i – max number of variables allowed in a mini-bucket Output: [lower bound (cost of a sub-optimal solution), upper bound]

Example: approx-mpe(3) versus elim-mpe

2* w 4* w

Properties of MBE(i)

Complexity: O(r exp(i)) time and O(exp(i)) space. Yields an upper-bound and a lower-bound.

Accuracy: determined by upper/lower (U/L) bound.

As i increases, both accuracy and complexity increase.

Possible use of mini-bucket approximations: As anytime algorithms As heuristics in search

Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)



Inference Bucket elimination, dynamic programming Mini-bucket elimination



The Search Space

9

1

mini

iX ff

A

E

C

B

F

D

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

C

D

F

E

B

A 0 1

Objective function:

A B f1

0 0 20 1 01 0 11 1 4

A C f2

0 0 30 1 01 0 01 1 1

A E f3

0 0 00 1 31 0 21 1 0

A F f4

0 0 20 1 01 0 01 1 2

B C f5

0 0 00 1 11 0 21 1 4

B D f6

0 0 40 1 21 0 11 1 0

B E f7

0 0 30 1 21 0 11 1 0

C D f8

0 0 10 1 41 0 01 1 0

E F f9

0 0 10 1 01 0 01 1 2

The Search Space

Arc-cost is calculated based on cost components.

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

C

D

F

E

B

A 0 1

A B f1

0 0 20 1 01 0 11 1 4

A

E

C

B

F

D

A C f2

0 0 30 1 01 0 01 1 1

A E f3

0 0 00 1 31 0 21 1 0

A F f4

0 0 20 1 01 0 01 1 2

B C f5

0 0 00 1 11 0 21 1 4

B D f6

0 0 40 1 21 0 11 1 0

B E f7

0 0 30 1 21 0 11 1 0

C D f8

0 0 10 1 41 0 01 1 0

E F f9

0 0 10 1 01 0 01 1 2

3 02 23 02 23 02 23 02 2 3 02 23 02 23 02 23 02 2

0 0

3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3

5 6 4 2 2 4 1 0

3 1

2

5 4

0

1 20 41 20 41 20 41 20 4 1 20 41 20 41 20 41 20 4

5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0

5 6 4 2 2 4 1 0

0 2 2 5

0 4

9

1

mini

iX ff

The Value Function

Value of node = minimal cost solution below it

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

C

D

F

E

B

A 0 1

A B f1

0 0 20 1 01 0 11 1 4

A

E

C

B

F

D

A C f2

0 0 30 1 01 0 01 1 1

A E f3

0 0 00 1 31 0 21 1 0

A F f4

0 0 20 1 01 0 01 1 2

B C f5

0 0 00 1 11 0 21 1 4

B D f6

0 0 40 1 21 0 11 1 0

B E f7

0 0 30 1 21 0 11 1 0

C D f8

0 0 10 1 41 0 01 1 0

E F f9

0 0 10 1 01 0 01 1 2

3 00

2 2

6

2

3

3 02 23 02 23 02 2 3 02 23 02 23 02 23 02 20 0 02 2 2 0 2 0 0 02 2 2

3 3 3 1 1 1 1

8 5 3 1

5

5

1 0 1 1 10 0 0 1 0 1 1 10 0 0

2 2 2 2 0 0 0 0

7 4 2 0

7 4

7

50 0

3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3

5 6 4 2 2 4 1 0

3 1

2

5 4

0

1 20 41 20 41 20 41 20 4 1 20 41 20 41 20 41 20 4

5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0

5 6 4 2 2 4 1 0

0 2 2 5

0 4

9

1

mini

iX ff

An Optimal Solution

Value of node = minimal cost solution below it

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

C

D

F

E

B

A 0 1

A B f1

0 0 20 1 01 0 11 1 4

A

E

C

B

F

D

A C f2

0 0 30 1 01 0 01 1 1

A E f3

0 0 00 1 31 0 21 1 0

A F f4

0 0 20 1 01 0 01 1 2

B C f5

0 0 00 1 11 0 21 1 4

B D f6

0 0 40 1 21 0 11 1 0

B E f7

0 0 30 1 21 0 11 1 0

C D f8

0 0 10 1 41 0 01 1 0

E F f9

0 0 10 1 01 0 01 1 2

3 00

2 2

6

2

3

3 02 23 02 23 02 2 3 02 23 02 23 02 23 02 20 0 02 2 2 0 2 0 0 02 2 2

3 3 3 1 1 1 1

8 5 3 1

5

5

1 0 1 1 10 0 0 1 0 1 1 10 0 0

2 2 2 2 0 0 0 0

7 4 2 0

7 4

7

50 0

3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3

5 6 4 2 2 4 1 0

3 1

2

5 4

0

1 20 41 20 41 20 41 20 4 1 20 41 20 41 20 41 20 4

5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0

5 6 4 2 2 4 1 0

0 2 2 5

0 4

Basic Heuristic Search Schemes

Heuristic function f(x) computes a lower bound on the best

extension of x and can be used to guide a heuristic search algorithm. We focus on

1.Branch and BoundUse heuristic function f(xp) to prune the depth-first search tree.Linear space

2.Best-First SearchAlways expand the node with the highest heuristic value f(xp).Needs lots of memory

f L

L

Classic Branch-and-Bound

nn

g(n)g(n)

h(n)h(n)

LB(n) = g(n) + h(n)LB(n) = g(n) + h(n)

Lower Bound Lower Bound LBLB

OR Search Tree

Prune if LB(n) ≥ UBPrune if LB(n) ≥ UB

Upper Bound Upper Bound UBUB

How to Generate Heuristics

The principle of relaxed models

Linear optimization for integer programs

Mini-bucket elimination Bounded directional consistency ideas

Generating Heuristic for graphical models(Kask and Dechter, 1999)

Given a cost function

C(a,b,c,d,e) = f(a) • f(b,a) • f(c,a) • f(e,b,c) • P(d,b,a)

Define an evaluation function over a partial assignment as theprobability of it’s best extension

f*(a,e,d) = minb,c f(a,b,c,d,e) = = f(a) • minb,c f(b,a) • P(c,a) • P(e,b,c) • P(d,a,b)

= g(a,e,d) • H*(a,e,d)

D

E

E

DA

D

BD

B0

1

1

0

1

0

Generating Heuristics (cont.)

H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b)

= minc [f(c,a) • minb [f(e,b,c) • f(b,a) • f(d,a,b)]]

minc [f(c,a) • minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]]

minb [f(b,a) • f(d,a,b)] • minc [f(c,a) • minb f(e,b,c)]

= hB(d,a) • hC(e,a)

= H(a,e,d)

f(a,e,d) = g(a,e,d) • H(a,e,d) f*(a,e,d)

The heuristic function H is what is compiled during the preprocessing stage of the

Mini-Bucket algorithm.

Static MBE Heuristics Given a partial assignment xp, estimate the cost of the

best extension to a full solution The evaluation function f(x^p) can be computed using

function recorded by the Mini-Bucket scheme

B: P(E|B,C) P(D|A,B) P(B|A)

A:

E:

D:

C: P(C|A) hB(E,C)

hB(D,A)

hC(E,A)

P(A) hE(A) hD(A)

f(a,e,D) = P(a) · hB(D,a) · hC(e,a)

g h – is admissible

A

B C

D

E

Belief Network

E

E

DA

D

BD

B

0

1

1

0

1

0

f(a,e,D))=g(a,e) · H(a,e,D )

Heuristics Properties

MB Heuristic is monotone, admissible

Retrieved in linear time IMPORTANT:

Heuristic strength can vary by MB(i). Higher i-bound more pre-processing

stronger heuristic less search.

Allows controlled trade-off between preprocessing and search

Experimental Methodology

Algorithms BBMB(i) – Branch and Bound with MB(i) BBFB(i) - Best-First with MB(i) MBE(i)

Test networks: Random Coding (Bayesian) CPCS (Bayesian) Random (CSP)

Measures of performance Compare accuracy given a fixed amount of time - how

close is the cost found to the optimal solution Compare trade-off performance as a function of time

Empirical Evaluation of mini-bucket heuristics,Bayesian networks, coding

Time [sec]

0 10 20 30

% S

olve

d E

xact

ly

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

BBMB i=2

BFMB i=2

BBMB i=6

BFMB i=6

BBMB i=10

BFMB i=10

BBMB i=14

BFMB i=14

Random Coding, K=100, noise=0.28 Random Coding, K=100, noise 0.32

Time [sec]

0 10 20 30

% S

olve

d E

xact

ly

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

BBMB i=6 BFMB i=6 BBMB i=10 BFMB i=10 BBMB i=14 BFMB i=14

Random Coding, K=100, noise=0.32

38

Max-CSP experiments(Kask and Dechter, 2000)

Dynamic MB Heuristics

Rather than pre-compiling, the mini-bucket heuristics can be generated during search

Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node in the search space

(a partial assignment, along the given variable ordering)

Dynamic MB and MBTE Heuristics(Kask, Marinescu and Dechter, 2003)

Rather than precompile compute the heuristics during search

Dynamic MB: Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node during search

Dynamic MBTE: We can compute heuristics simultaneously for all un-instantiated variables using mini-bucket-tree elimination .

MBTE is an approximation scheme defined over cluster-trees. It outputs multiple bounds for each variable and value extension at once.

Mini-Clustering

Motivation: Time and space complexity of Cluster Tree Elimination

depend on the induced width w* of the problem When the induced width w* is big, CTE algorithm becomes

infeasible The basic idea:

Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables

Accuracy parameter i = maximum number of variables in a mini-cluster

The idea was explored for variable elimination (Mini-Bucket)

Split a cluster into mini-clusters => bound complexity

)()( :decrease complexity lExponentia rnrn eOeO)O(e

Idea of Mini-Clustering

},...,,,...,{ 11 nrr hhhh )(ucluster

elim

n

iihh

1

},...,{ 1 rhh },...,{ 1 nr hh

elim

n

rii

elim

r

ii hhg

11

gh

EF

BF

BC

),|()|()(:),(1)2,1( bacpabpapcbh

a

)2,1(H

fd

fd

dcfpch

fbhbdpbh

,

2)1,2(

,

1)2,3(

1)1,2(

),|(:)(

),()|(:)(

)1,2(H

dc

dc

dcfpfh

cbhbdpbh

,

2)3,2(

,

1)2,1(

1)3,2(

),|(:)(

),()|(:)(

)3,2(H

),(),|(:),( 1)3,4(

1)2,3( fehfbepfbh

e

)2,3(H

)()(),|(:),( 2)3,2(

1)3,2(

1)4,3( fhbhfbepfeh

b

)4,3(H

),|(:),(1)3,4( fegGpfeh e)3,4(H

ABC

2

4

1

3 BEF

EFG

BCDF

Mini-Clustering - example

ABC

2

4

),()2,1( cbh1

3 BEF

EFG

),()1,2( cbh

),()3,2( fbh

),()2,3( fbh

),()4,3( feh

),()3,4( fehEF

BF

BC

BCDF

),(1)2,1( cbh

)(

)(2

)1,2(

1)1,2(

ch

bh

)(

)(2

)3,2(

1)3,2(

fh

bh

),(1)2,3( fbh

),(1)4,3( feh

),(1)3,4( feh

)2,1(H

)1,2(H

)3,2(H

)2,3(H

)4,3(H

)3,4(H

ABC

2

4

1

3 BEF

EFG

EF

BF

BC

BCDF

Mini Bucket Tree Elimination

Mini-Clustering

Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) for each variable and each of its values.

MBTE: when the clusters are buckets in BTE.

Branch and Bound w/ Mini-Buckets

BB with static Mini-Bucket Heuristics (s-BBMB) Heuristic information is pre-compiled before search.

Static variable ordering, prunes current variable

BB with dynamic Mini-Bucket Heuristics (d-BBMB)

Heuristic information is assembled during search. Static variable ordering, prunes current variable

BB with dynamic Mini-Bucket-Tree Heuristics (BBBT)

Heuristic information is assembled during search. Dynamic variable ordering, prunes all future variables

Empirical Evaluation Algorithms:

Complete BBBT BBMB

Incomplete DLM GLS SLS IJGP IBP (coding)

Measures: Time Accuracy (% exact) #Backtracks Bit Error Rate (coding)

Benchmarks: Coding networks Bayesian Network

Repository Grid networks (N-by-N) Random noisy-OR networks Random networks

Real World Benchmarks

Average Accuracy and Time. 30 samples, 10 observations, 30 seconds

Empirical Results: Max-CSP Random Binary Problems: <N, K, C, T>

N: number of variables K: domain size C: number of constraints T: Tightness

Task: Max-CSP

BBBT(i) vs. BBMB(i).

BBBT(i) vs BBMB(i), N=100

i=2 i=3 i=4 i=5 i=6 i=7 i=2

Searching the Graph; caching goods

C context(C) = [ABC]

D context(D) = [ABD]

F context(F) = [F]

E context(E) = [AE]

B context(B) = [AB]

A context(A) = [A]

A

E

C

B

F

D

A B f1

0 0 20 1 01 0 11 1 4

A C f2

0 0 30 1 01 0 01 1 1

A E f3

0 0 00 1 31 0 21 1 0

A F f4

0 0 20 1 01 0 01 1 2

B C f5

0 0 00 1 11 0 21 1 4

B D f6

0 0 40 1 21 0 11 1 0

B E f7

0 0 30 1 21 0 11 1 0

C D f8

0 0 10 1 41 0 01 1 0

E F f9

0 0 10 1 01 0 01 1 2

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1

0 1

0 1 0 1

0 1 0 1

0 1

5

0 0

2 0 0 4

3 1 5 4 0 2 2 5

30

22 1

2

04

35

3 5 1 3 13

56 4

2 24 1

0 56 4

2 24

0

52

5 2 3 0 30

Searching the Graph; caching goods

C context(C) = [ABC]

D context(D) = [ABD]

F context(F) = [F]

E context(E) = [AE]

B context(B) = [AB]

A context(A) = [A]

A

E

C

B

F

D

A B f1

0 0 20 1 01 0 11 1 4

A C f2

0 0 30 1 01 0 01 1 1

A E f3

0 0 00 1 31 0 21 1 0

A F f4

0 0 20 1 01 0 01 1 2

B C f5

0 0 00 1 11 0 21 1 4

B D f6

0 0 40 1 21 0 11 1 0

B E f7

0 0 30 1 21 0 11 1 0

C D f8

0 0 10 1 41 0 01 1 0

E F f9

0 0 10 1 01 0 01 1 2

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1

0 1

0 1 0 1

0 1 0 1

0 1

0 2 1 0

3 3 1 1 2 2 0 0

8 5 3 1 7 4 2 0

6 5 7 4

5 7

5

0 0

2 0 0 4

3 1 5 4 0 2 2 5

30

22 1

2

04

35

3 5 1 3 13

56 4

2 24 1

0 56 4

2 24 1

0

52

5 2 3 0 30



Inference Bucket elimination, dynamic programming Mini-bucket elimination, belief propagation



chapter 13

Documents

b v c v d

v b v c

v b v e

upper bound example

lower bound cost

mpe tasksplit

x j cjmaxcsp

functions constraints