optimization with parity constraints: from binary codes to discrete integration

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION

Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal+, and Bart Selman*

*Cornell University+IBM Watson Research Center

UAI - 2013

1

High-dimensional integration• High-dimensional integrals in statistics, ML, physics

• Expectations / model averaging• Marginalization• Partition function / rank models / parameter learning

• Curse of dimensionality:

• Quadrature involves weighted sum over exponential number of items (e.g., units of volume)

L L2 L3 Ln

n dimensional hypercube

L4

2

Discrete Integration• We are given

• A set of 2n items• Non-negative weights w

• Goal: compute total weight • Compactly specified weight function:

• factored form (Bayes net, factor graph, CNF, …)

• Example 1: n=2 variables, sum over 4 items

• Example 2: n= 100 variables, sum over 2100 ≈1030 items (intractable)

1 4 0 5…

2n Items

Goal: compute 5 + 0 + 2 + 1 = 85 0 2 1

Size visually represents weight

5

0

1

2

3

factor

Hardness • 0/1 weights case:

• Is there at least a “1”? SAT• How many “1” ? #SAT• NP-complete vs. #P-complete. Much harder

• General weights:• Find heaviest item (combinatorial optimization, MAP)• Sum weights (discrete integration)

• [ICML-13] WISH: Approximate Discrete Integration via Optimization. E.g., partition function via MAP inference

• MAP inference often fast in practice:• Relaxations / bounds• Pruning

0

4

3

7

0

0

1

1

4

P

NP

P^#P

PSPACE

Easy

Hard

PH

EXP

The algorithm requires only O(n log n) MAP queries to approximate the partition function within a constant factor

WISH : Integration by Hashing and Optimization

Aggregate MAP inference solutions

Outer loop over n variables

5

MAP inference on model augmented with random parity constraintsRepeat log(n) times

Parity check nodes enforcing A σ= b (mod 2)

Original graphical modelAUGMENTED MODEL

σ

σ {0,1}n

n binary variables

Visual working of the algorithm• How it works

….

median M1

1 random parity constraint

2 random parity constraints

…. ….

3 random parity constraints

median M2 median M3

….

Mode M0 + + +×1 ×2 ×4 + …

6

Function to be integrated

n times

Log(n) times

• Theorem [ICML-13]: With probability at least 1- δ (e.g., 99.9%) WISH computes a 16-approximation of the partition function (discrete integral) by solving θ(n log n) MAP inference queries (optimization).

• Theorem [ICML-13]: Can improve the approximation factor to (1+ε) by adding extra variables and factors.• Example: factor 2 approximation with 4n variables

• Remark: faster than enumeration only when combinatorial optimization is efficient

Accuracy Guarantees

7

Summary of contributions• Introduction and previous work:

• WISH: Approximate Discrete Integration via Optimization. • Partition function / marginalization via MAP inference• Accuracy guarantees

• MAP Inference subject to parity constraints:• Tractable cases and approximations• Integer Linear Programming formulation• New family of polynomial time (probabilistic) upper and lower

bounds on partition function that can be iteratively tightened (will reach within constant factor)

• Sparsity of the parity constraints:• Techniques to improve solution time and bounds quality• Experimental improvements over variational techniques

8

MAP INFERENCE WITH PARITY CONSTRAINTSHardness, approximations, and bounds

9

Making WISH more scalable• Would approximations to the optimization (MAP inference

with parity constraints) be useful? YES• Bounds on MAP (optimization) translate to bounds on the

partition function Z (discrete integral)

• Lower bounds (local search) on MAP lower bounds on Z• Upper bounds (LP,SDP relaxation) on MAP upper bounds on Z• Constant-factor approximations on MAP constant factor on Z

• Question: Are there classes of problems where we can efficiently approximate the optimization (MAP inference) in the inner loop of WISH?

10

Error correcting codes

Communication over a noisy channel

•Bob: There has been a transmission error! What was the message actually sent by Alice?

• Must be a valid codeword• As close as possible to received message y

11

Noisy channel

x y0100|1 0110|1

Alice Bob

Redundant parity check bit=0 XOR 1 XOR 0 XOR 0

Parity check bit = 1≠

0 XOR 1 XOR 1 XOR 0 = 0

Decoding a binary code• Max-likelihood decoding

12

Parity check nodes

ML-decoding graphical modelOur more general case

Parity check nodes

More complex probabilistic model

Max w(x) subject to A x = b (mod 2)

Equivalent to MAP inference on augmented model

MAP inference is NP hard to approximate within any constant factor

[Stern, Arora,..]

Transmitted string must be a codeword

LDPC Routinely solved: 10GBase-T Ethernet, Wi-Fi 802.11n, digital TV,..

x

Noisy channel model

Noisy channel

x y

0100|1 0110|1

Decoding via Integer Programming• MAP inference subject to parity constraints encoded as

an Integer Linear Program (ILP):• Standard MAP encoding• Compact (polynomial) encoding by Yannakakis for parity

constraints

• LP relaxation: relax integrality constraint• Polynomial time upper bounds

• ILP solving strategy: cuts + branching + LP relaxations• Solve a sequence of LP relaxations• Upper and lower bounds that improve over time

13

Parity polytope

Iterative bound tighteningPolynomial time upper ad lower bounds on MAP that are iteratively tightened over time

•Recall: bounds on optimization (MAP) (probabilistic) bounds on the partition function Z. New family of bounds.•WISH: When MAP is solved to optimality (LowerBound = UpperBound), guaranteed constant factor approximation on Z

14

SPARSITY OF THE PARITY CONSTRAINTSImproving solution time and bounds quality

15

Inducing sparsity• Observations:

• Problems with sparse A x = b (mod 2) are empirically easier to solve (similar to Low-Density Parity Check codes)

• Quality of LP relaxation depends on A and b , not just on the solution space. Elementary row operations (e.g., sum 2 equations) do not change solution space but affect the LP relaxation.

1)Reduce A x = b (mod 2) to row-echelon form with Gaussian elimination (linear equations over finite field)

2)Greedy application of elementary row operations

16

Parity check nodesEquivalent but sparser Parity check nodes

Matrix A in row-echelon form

Improvements from sparsity• Quality of LP relaxations significantly improves• Finds integer solutions faster (better lower bounds)

17

Improvements from sparsification using IBM CPLEX ILP solver for a 10x10 Ising Grid

Upper bound improvement

Without sparsification,fails at finding integer solutions (LB)

• WISH based on Universal Hashing:• Randomly generate A in {0,1}i×n, b in {0,1}i

• Then A x + b (mod 2) is:• Uniform over {0,1}i

• Pairwise independent

• Suppose we generate a sparse matrix A• At most k variables per parity constraint (up to k ones per row of A)• A x+b (mod 2) is still uniform, not pairwise independent anymore

• E.g. for k=1, A x = b mod 2 is equivalent to fixing i variables. Lots of correlation. (Knowing A x = b tells me a lot about A y = b)

Generating sparse constraints

n

We optimize oversolutions of A x = b mod 2(parity constraints)

Ai

x= b (mod 2)

18

Given variable assignments x and y , the eventsA x = b (mod 2) and A y =b (mod 2)are independent.

Using sparse parity constraints• Theorem: With probability at least 1- δ (e.g., 99.9%)

WISH with sparse parity constraints computes an approximate lower bound of the partition function.

• PRO: “Easier” MAP inference queries • For example, random parity constraints of length 1 (= on a single

variable). Equivalent to MAP with some variables fixed.

• CON: We lose the upper bound part. Output can underestimate the partition function.

• CON: No constant factor approximation anymore

19

MAP with sparse parity constraints• MAP inference with sparse constraints evaluation

• ILP and Branch&Bound outperform message-passing (BP, MP and MPLP)

20

10x10 attractive Ising Grid 10x10 mixed Ising Grid

Experimental results• ILP provides probabilistic upper and lower bounds

that improve over time and are often tighter than variational methods (BP, MF, TRW)

21

Experimental results (2)• ILP provides probabilistic upper and lower bounds

that improve over time and are often tighter than variational methods (BP, MF, TRW)

22

Conclusions• [ICML-13] WISH: Discrete integration reduced to small

number of optimization instances (MAP)• Strong (probabilistic) accuracy guarantees• MAP inference is still NP-hard

• Scalability: Approximations and Bounds• Connection with max-likelihood decoding• ILP formulation + sparsity (Gauss sparsification & uniform hashing)

• New family of probabilistic polynomial time computable upper and lower bounds on partition function. Can be iteratively tightened (will reach within a constant factor)

• Future work:• Extension to continuous integrals and variables• Sampling from high-dimensional probability distributions

23

Extra slides

24

optimization with parity constraints: from binary codes to discrete integration

Documents