dynamic programming: theory and practicepeople.csail.mit.edu/srush/slides.pdf · but i don’t have...

66
Dynamic Programming: Theory and Practice Alexander Rush and Nathaniel Filardo MASC 2013 October 11, 2013

Upload: others

Post on 09-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Dynamic Programming: Theory and Practice

Alexander Rush and Nathaniel FilardoMASC 2013

October 11, 2013

Page 2: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

So I’ve been working with a student.

And I ask her to build a dependency parser.

“What should I use to write that?”

Page 3: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

So I’ve been working with a student.

And I ask her to build a dependency parser.

“What should I use to write that?”

Page 4: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

So I’ve been working with a student.

And I ask her to build a dependency parser.

“What should I use to write that?”

Page 5: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

when I am in my advisor’s class

Page 6: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

when I read the paper

Page 7: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

when I look at the code

Page 8: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Question

We no longer manually implement

I Simplex for LPs

I SVM training for classification

I FST algorithms

I etc.

But I don’t have a good answer for dynamic programming.

Page 9: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Question

We no longer manually implement

I Simplex for LPs

I SVM training for classification

I FST algorithms

I etc.

But I don’t have a good answer for dynamic programming.

Page 10: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Outline

Four views of Dynamic Programming

1. Imperative k

2. Declarative

3. Graphical

4. Optimization

Page 11: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Dynamic Programming for NLP

Page 12: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Decoding for NLP

I f ; a scoring function.

I y ; a possible output structure.

optimization problem:maxy

f (y)

Page 13: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Example: Dependency Parsing

Page 14: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Dynamic Programming for Decoding

“Dynamic programming is a method for solving complex problemsby breaking them down into simpler subproblem”

Traditionally defined as an optimization problem defined using theBellman equations.

However these equations don’t give use much insight into what todo in practice.

Page 15: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

View 1: Imperative

Page 16: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

In my advisor’s class, we teach an imperative implementation ofdynamic programming using charts.

Some students enjoy this, it correlates well with not having bugs.

Page 17: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

In my advisor’s class, we teach an imperative implementation ofdynamic programming using charts.

Some students enjoy this, it correlates well with not having bugs.

Page 18: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement
Page 19: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Benefits

I Often fastest method in practice.

I Doesn’t require any additional code or library.

I Great way to prove how hardcore you are.

Page 20: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Issue 1

I Debugging is very subtle.

Page 21: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement
Page 22: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Issue 2

I Optimized to the specific problem domain.

Cannot use MSTParser for POS tagging.

Cannot really use MSTParser third-order parsing.

Page 23: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Issue 2

I Optimized to the specific problem domain.

Cannot use MSTParser for POS tagging.

Cannot really use MSTParser third-order parsing.

Page 24: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Issue 3

Cannot easily utilize dynamic programming extensions.

I Variant algorithmsI K-BestI Loss-Augmented inferenceI Posterior Decoding

I Pruning algorithmsI Max-MarginalsI Coarse-to-Fine

I Constrained algorithmsI Beam SearchI Dual DecompositionI Lagrangian Relaxation

....

Page 25: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Issue 3

Cannot easily utilize dynamic programming extensions.

I Variant algorithmsI K-BestI Loss-Augmented inferenceI Posterior Decoding

I Pruning algorithmsI Max-MarginalsI Coarse-to-Fine

I Constrained algorithmsI Beam SearchI Dual DecompositionI Lagrangian Relaxation

....

Page 26: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Issue 3

Cannot easily utilize dynamic programming extensions.

I Variant algorithmsI K-BestI Loss-Augmented inferenceI Posterior Decoding

I Pruning algorithmsI Max-MarginalsI Coarse-to-Fine

I Constrained algorithmsI Beam SearchI Dual DecompositionI Lagrangian Relaxation

....

Page 27: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Opinion Slide

Do not write imperative dynamic programming code.

Page 28: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

View 2: Declarative

Page 29: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Dependency Parsing Setup

Page 30: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

Page 31: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

Page 32: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

Page 33: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

Page 34: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

Page 35: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

Page 36: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild* As McGwire neared , fans went wild

Page 37: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

Page 38: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

First-Order Parsing

* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild* As McGwire neared , fans went wild

* As McGwire neared , fans went wild

Page 39: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Declarative code is clean, often less buggy.

:- item(item, double, 0).

:- item([lhalf,rhalf,end], bool, false).

constit(A,H,H,H) += rewrite(A:D,D) if word(D,H).

left(A/C,I,J,H) += constit(B,I,H2,J) * rewrite(A:D,B:D2,C:D)

if lhalf(C,=J+1,H) & word(D,H) & word(D2,H2).

right(A\C,H,I,J) += constit(B,I,H2,J) * rewrite(A:D,C:D,B:D2)

if rhalf(C,H,=J-1) & word(D,H) & word(D2,H2).

lhalf(A,I,H) |= constit(A,I,H,J)!=0.

rhalf(A,H,J) |= constit(A,I,H,J)!=0.

constit(A,I,H,K) += left(A/C,I,J,H) * constit(C,=J+1,H,K).

constit(A,I,H,K) += constit(C,I,H,=J-1) * right(A\C,H,J,K).

goal += constit(s,1,_,N) if end(N).

I am not smart enough to make this efficient.

Page 40: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Declarative code is clean, often less buggy.

:- item(item, double, 0).

:- item([lhalf,rhalf,end], bool, false).

constit(A,H,H,H) += rewrite(A:D,D) if word(D,H).

left(A/C,I,J,H) += constit(B,I,H2,J) * rewrite(A:D,B:D2,C:D)

if lhalf(C,=J+1,H) & word(D,H) & word(D2,H2).

right(A\C,H,I,J) += constit(B,I,H2,J) * rewrite(A:D,C:D,B:D2)

if rhalf(C,H,=J-1) & word(D,H) & word(D2,H2).

lhalf(A,I,H) |= constit(A,I,H,J)!=0.

rhalf(A,H,J) |= constit(A,I,H,J)!=0.

constit(A,I,H,K) += left(A/C,I,J,H) * constit(C,=J+1,H,K).

constit(A,I,H,K) += constit(C,I,H,=J-1) * right(A\C,H,J,K).

goal += constit(s,1,_,N) if end(N).

I am not smart enough to make this efficient.

Page 41: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

View 3: Hypergraph

Page 42: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Hyperedge

Page 43: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Hypergraph

V set of vertices 1 root nodeE set of hyperedges h(e) head vertex of edge

t(e) tail vertex of edge

I X - set of valid “hyperpaths” yI y ∈ {0, 1}|V|+|E| has y(v) = 1 iff node v is in path

Any dynamic program can be represented as a hypergraph.

Page 44: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement
Page 45: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement
Page 46: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement
Page 47: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement
Page 48: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Weights

I θ ∈ R |E| - the weights associated with each hyperedge.

Hypergraph problem:maxy∈X

θ>y

Page 49: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Dynamic Programming on Hypergraphs

procedure BestPath(V, E , θ)π(v)← 0 for all terminalsfor v ∈ V in bottom-up order do

π(v)← maxe∈E:h(e)=v

θ(e) + π(t(e))

return π(1)

Page 50: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement
Page 51: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

procedure Outside(V, E , θ)ρ[1]← 0for e ∈ E in top-down order do〈〈v2, . . . , vk〉, v1〉 ← e

t ← θ(e) +k∑

i=2

π[vi ]

for i = 2 . . . k dos ← ρ[v1] + t − π[vi ]if s > ρ[vi ] then ρ[vi ]← s

return ρ

Page 52: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Benefits

I Helps debugging, visualizing, pruning, and explaining.

I Can optimize the hell out of the inner loop code.

I Useful for other downstream tasks.

Page 53: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Caveats

I Need to create edges in memory.

I Requires construction, often still imperative.

Page 54: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

View 4: Optimization

Page 55: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Linear Program

I Any dynamic program can be expressed as a linear program.

I Can be derived from the Bellman equations.

I Also directly from the hypergraph representation.

Page 56: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Linear Constraints Conversion

X = {y : y(1) = 1,

y(v) =∑

e:h(e)=v

y(e) ∀ v ∈ V except root,

y(v) =∑

e:t(e)=v

y(e) ∀ v ∈ V except terminals}

I O(|V|) constraints

Page 57: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Linear Constraints Conversion

X = {y : y(1) = 1,

y(v) =∑

e:h(e)=v

y(e) ∀ v ∈ V except root,

y(v) =∑

e:t(e)=v

y(e) ∀ v ∈ V except terminals}

I O(|V|) constraints

Page 58: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Linear Program

maxyθ>y

y(1) = 1

y(v) =∑

e:h(e)=v

y(e) ∀ v ∈ V except root

y(v) =∑

e:t(e)=v

y(e) ∀ v ∈ V except terminals

0 ≤ y(e) ≤ 1 ∀e ∈ E0 ≤ y(v) ≤ 1 ∀v ∈ V

Page 59: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

\* Hypergraph Problem *\

Minimize

OBJ: 0.865309927772 edge_0 + 0.805027827013 edge_1 + 0.719704686404 edge_2

+ 0.824844977148 edge_3 + 0.668153201232 edge_4 + 0.932833824227 edge_5

+ 0.551267246091 edge_6

Subject To

_C1: node_13_tri_right_0_3 = 1

_C10: - edge_0 + node_1_tri_right_1_1 = 0

_C11: - edge_1 + node_2_tri_left_1_1 = 0

_C12: - edge_2 + node_3_tri_right_2_2 = 0

_C13: - edge_0 + node_4_tri_left_2_2 = 0

_C14: - edge_3 + node_5_tri_right_3_3 = 0

_C15: - edge_2 + node_6_tri_left_3_3 = 0

_C16: - edge_1 + node_7_trap_left_1_2 = 0

_C17: - edge_4 + node_8_tri_left_1_2 = 0

_C18: - edge_3 + node_9_trap_right_2_3 = 0

. . .

Page 60: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Benefits

I Can solve with general-purpose LP solver. (Gurobi)

I There are tricks for outside, K-Best, etc.

I Real benefit: Adding additional constraints.

Page 61: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Constrained Paths

problem: constrain maximization to valid paths

I A ∈|b|×|E|; a matrix of linear constraints

I b ∈|b|; a constraint vector

Constrained paths

X ′ = {y ∈ X : Ay = b}

Constrained best pathmaxy∈X ′

θ>y

Page 62: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Individual Edge (e)

42

(diese kritik, this criticism) criticism must

θ(e) = score(diese kritik, this criticism) +

score(must, this) + score(this, criticism)

Page 63: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Constraints

4ROOT

6

1

0

3

2

5

take

criticism

must

seriously

</s>

must

seriously

criticism

take

seriously

take <s>

take

criticism

must

seriously take

criticism

seriously

must

seriously

must

take

criticism

Page 64: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Constraints

seriously

take

seriously

must

take

criticism

must

seriously

criticism

take take

criticism

seriously

must

<s>

take

criticism

must

seriously take

criticism

must

seriously

</s>

Page 65: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement
Page 66: Dynamic Programming: Theory and Practicepeople.csail.mit.edu/srush/slides.pdf · But I don’t have a good answer for dynamic programming. Question We no longer manually implement

Pitch: PyDecode

A pragmatic library for building dynamic programming.

I Easy interface in PythonI Hypergraph code in C++

I Inside/Outside ViterbiI Pruning with max-marginalsI Constraint specificationI Dual Decomposition/Lagrangian RelaxationI Export to LP

I Visualization tools supporting IPython Notebook