chapter 16

Chapter 16

Dynamic Programming

What is Dynamic Programming?

Solves problems by combining solutions to subproblems Similar to divide-and-conquer

Applicable when subproblems are not independent Subproblems share sub-subproblems Common sub-subproblems are computed once, then

answers are reused as needed Typically used to solve optimization problems

What Are Optimization Problems? Problems for which there may be many

solutions Each solution has a value We want the solution that has an “optimal” value

Often the minimum or maximum value There may be more than one “optimal” solution

Development of DP Algorithms

Four steps: Characterize the structure of an optimal solution Recursively define the value of an optimal solution Compute the value of an optimal solution in a bottom-up

fashion Construct an optimal solution from computed information

Matrix-Chain Multiplication

Given a sequence of matrices A1..An, we wish to compute their product

This can be computed by repeated multiplication of matrix pairs using the standard algorithm (next slide) Ambiguities must be resolved in advance by parenthesizing the

matrices Matrix multiplication is associative, so all parenthesizations yield

the same product While order of parenthesizing isn’t mathematically important, it

can make a dramatic difference to the cost of evaluation

Matrix Multiplication Algorithm

Matrix MatrixMultiply(const Matrix &A, const Matrix &B)

{

if ( A.columns() != B.rows() )

throw XIncompatibleDimensions;

Matrix C(A.rows(), B.columns());

for ( int i = 0 ; i < A.rows ; ++i )

{

for ( int j = 0 ; j < B.columns ; ++j )

{

C(i,j) = 0;

for ( int k = 0 ; k < A.columns ; ++k )

C(i,j) += A(i,k)*B(k,j);

}

}

}

Matrix MatrixMultiply(const Matrix &A, const Matrix &B)

{

if ( A.columns() != B.rows() )

throw XIncompatibleDimensions;

Matrix C(A.rows(), B.columns());

for ( int i = 0 ; i < A.rows ; ++i )

{

for ( int j = 0 ; j < B.columns ; ++j )

{

C(i,j) = 0;

for ( int k = 0 ; k < A.columns ; ++k )

C(i,j) += A(i,k)*B(k,j);

}

}

}

Matrix Multiplication Algorithm

To multiply two matrices A (p x q) and B (q x r), the number of columns in A must equal the number of rows in B.

The resulting matrix is p x r Running time of algorithm is O(pqr)

Why Parenthesization Order Matters Consider three matrices: 10x100, 100x5, and

5x50 If we multiply (A1A2)A3, we perform 10x100x5 +

10x5x50 = 7,500 multiplications If we multiply A1(A2A3), we perform 100x5x50 +

10x100x50 = 75,000 multiplications

The Matrix-Chain Multiplication Problem Given a chain <A1,A2,…,An> of n matrices,

where for i = 1, 2, …, n, the matrix Ai has dimension pi-1xpi, fully parenthesize the product A1A2…An in a way that minimizes the number of scalar multiplications

Structure of an Optimal Parenthesization To compute the product of all matrices (denoted as

A1..n), we must first: Compute products A1..k and Ak+1..n

Multiply them together

The cost of computing A1..n is then just the cost of computing A1..k, plus the cost of computing Ak+1..n, plus the cost of multiplying them together

Structure of an Optimal Parenthesization The parenthesization of the “prefix” subchain A1,…,Ak and “suffix”

subchain Ak+1,…,An within the optimal parenthesization of A1,…,An must be optimal If it weren’t, then there would be some other parenthesization

order with lower cost, which would result in non-optimal parenthesization of A1,…,An

Optimal solution to the whole problem therefore contains optimal subproblem solutions This is a hallmark of dynamic programming

A Recursive Solution

Our subproblem is the problem of determining the minimum cost of a parenthesization of Ai,…,Aj, where 1 <= i <= j <= n

Let m[i,j] be the minimum number of multiplications needed to compute Ai..j

Lowest cost to compute A1..n is m[1,n]


How do we recursively define m[i,j]? If i = j, no multiplications are necessary since we never

have to multiply a matrix by itself Thus, m[i,i] = 0 for all i

If i < j, we compute as follows: Assume that there is some optimal parenthesization that splits

the range between k and k+1 m[i,j] = m[i,k]+m[k+1,j]+pi-1pkpj, per our previous discussion

To do this, we must find k There are j-i possibilities, which can be checked directly


Our recursive solution is now:

m[i,j] gives the costs of optimal solutions to subproblems

Define a second table s[i,j] to help keep track of how to construct an optimal solution Each entry contains the value k at which to split the

product

ji

jipppjkmkimjimjki

jki if

if}],1[],[{min

0],[

1

Computing the Optimal Costs

An algorithm to compute the minimum cost for a matrix-chain multiplication should now be simple It will take exponential time, which is no better

than our brute force approach So how can we improve the running time?


There are only O(n2) total subproblems One subproblem for each choice of i and j

satisfying 1 <= i <= j <= n A recursive algorithm may encounter each

subproblem many times Recomputations of known values is costly This is where dynamic programming techniques

are superior


Instead of using recursion, we will use a the dynamic programming bottom-up approach to compute the cost

The following algorithm assumes that the matrix Ai has dimension pi-1xpi

The input sequence is an array <p0, p1, …, pn> with length n+1

The output is the matrices m and s

MatrixChainOrder(const double p[], int size, Matrix &m, Matrix &s)

{

int n = size-1;

for ( int i = 1 ; i <= n ; ++i )

m(i,i) = 0;

for ( int L = 2 ; L <= n ; ++L ) {

for ( int i = 1 ; i <= n-L+1 ; ++i ) {

for ( int j = 0 ; j <= i+L-1 ; ++j ) {

m(i,j) = MAXDOUBLE;

for ( int k = i ; k <= j-1 ; ++k ) {

int q = m(i,k)+m(k+1,j)+p[i-1]*p[k]*p[j];

if ( q < m(i,j) ) m(i,j) = q;

else s(i,j) = k;

} // for ( k )

} // for ( j )

} // for ( i )

} // for ( L )

}

MatrixChainOrder(const double p[], int size, Matrix &m, Matrix &s)

{

int n = size-1;

for ( int i = 1 ; i <= n ; ++i )

m(i,i) = 0;

for ( int L = 2 ; L <= n ; ++L ) {

for ( int i = 1 ; i <= n-L+1 ; ++i ) {

for ( int j = 0 ; j <= i+L-1 ; ++j ) {

m(i,j) = MAXDOUBLE;

for ( int k = i ; k <= j-1 ; ++k ) {

int q = m(i,k)+m(k+1,j)+p[i-1]*p[k]*p[j];

if ( q < m(i,j) ) m(i,j) = q;

else s(i,j) = k;

} // for ( k )

} // for ( j )

} // for ( i )

} // for ( L )

}

Example: m and s

0 15,750 7,875 9.375 11,875 15.1250 2,625 4,375 7,125 10,500

0 750 2,500 5,3750 1,000 3,500

0 5,0000

Matrix m

1 1 3 3 32 3 3 3

3 3 34 5

5

Matrix s

Input dimensions: 30x35, 35x15, 15x5, 5x10, 10x20, 20x25


The matrix m contains the costs of multiplications, and s contains which index of k was used to achieve the optimal cost

What is the running time? Three nested loops = O(n3) This is better than the exponential running time

the recurrence would give us

Constructing an Optimal Solution So far, we only know the optimal number of scalar

multiplications, not the order in which to multiply the matrices This information is encoded in the table s

Each entry s[i,j] records the value k such that the optimal parenthesization of Ai…Aj occurs between matrix k and k+1

To compute the product A1..n, we parenthesize at s[1,n] Previous matrix multiplications can be computed recursively

E.g., s[1,s[1,n]] contains the optimal split for the left half of the multiplication

Constructing an Optimal SolutionMatrixChainMultiply(const Matrix A[], const Matrix &s, int

i, int j)

{

if ( j > i )

{

Matrix X = MatrixChainMultiply(A, s, i, s(i,j));

Matrix Y = MatrixChainMultiply(A, s, s(i,j)+1, j);

return MatrixMultiply(X,Y);

}

else

return A[i];

}

MatrixChainMultiply(const Matrix A[], const Matrix &s, int i, int j)

{

if ( j > i )

{

Matrix X = MatrixChainMultiply(A, s, i, s(i,j));

Matrix Y = MatrixChainMultiply(A, s, s(i,j)+1, j);

return MatrixMultiply(X,Y);

}

else

return A[i];

}

Elements of Dynamic Programming There are two key ingredients that an

optimization problem must have for dynamic programming to be applicable: Optimal substructure Overlapping subproblems

Optimal Substructure

A problem that exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems E.g., matrix-chain multiplication exhibited this property

An optimal parenthesization of a matrix chain requires that each sub-chain also be optimally parenthesized

This property is typically shown by assuming that a better solution exists, then showing how this contradicts the optimality of the solution to the original problem

Overlapping Subproblems

The “space” for subproblems must be relatively small i.e., a recursive algorithm for the solution would end up

solving the same subproblems over and over This is called overlapping subproblems

Dynamic programming algorithms typically compute overlapping subproblems once and store the solution in a table for later (constant-time) lookup

Overlapping Subproblems

From the matrix-chain algorithm, we see earlier computations being reused to perform later computations:

What if we replaced this with a recursive algorithm? Figure 16.2 on page 311 shows the added

computations

int q = m(i,k)+m(k+1,j)+p[i-1]*p[k]*p[j];if ( q < m(i,j) )

m(i,j) = q;else

s(i,j) = k;

Exercise

A common recursive algorithm for computing the Fibonacci sequence is:

2 if

2 if

1 if

)2()1(

1

1

)(

x

x

x

xFibxFib

xFib

Is dynamic programming applicable? Why? Write a dynamic programming algorithm for

solving this problem

Longest Common Subsequence

A subsequence is simply a part of a sequence, consisting of some number of consecutive elements

Formally: If X is a sequence of size m, and Z is a sequence

of size k, Z is a subsequence of X if there exists some j such that for 1<= i <= k, we have xi+j=zi

Longest Common Subsequence

Given two sequences X and Y, Z is a common subsequence of X and Y if Z is a subsequence of both X and Y

The longest common subsequence problem is thus the problem of finding the longest common subsequence of two given sequences

Brute-Force Approach

Enumerate all subsequences in X to see if they are also subsequences of Y, and keep track of the longest one found

If X is of size m, that’s 2m possibilities!

Characterizing an LCS

Does the LCS problem exhibit an optimal substructure property? Yes, corresponding to pairs of “prefixes”

A prefix of a sequence is simply the beginning portion of a sequence for some specified length e.g., if X = <A,B,C,B,D,A,B>, the fourth prefix of X

(X4) is <A,B,C,B>


From Theorem 16.1: Let X=<x1,…,xm>, Y=<y1,…,yn>, and Z=<z1,…,zk> be any LCS of X and Y If xm=yn, then zk=xm=yn and Zk-1 is an LCS of Xm-1 and Yn-1

If xm!=yn, then zk!=xm implies that Z is an LCS of Xm-1 and Y

If xm!=yn, then zk!=yn implies that Z is an LCS of X and Yn-1


What does this mean? An LCS of two sequences contains within it an

LCS of prefixes of the two sequences This is just the optimal substructure property

A Recursive Solution To Subproblems Theorem 16.1 gives us two conditions to check for:

If xm = yn, we need to find an LCS of Xm-1 & Yn-1, to which we append xm = yn

If xm != yn, then we must solve two subproblems: finding an LCS of Xm & Yn-1, and an LCS of Xm-1 & Yn

Whichever is longer is the LCS of X & Y Overlapping subproblems are evident

To find an LCS of X & Y, we must find LCS of Xm-1 and/or Yn-1, which have still smaller overlapping subproblems

A Recursive Solution To Subproblems What is the cost of an optimal solution?

Let c[i,j] be the length of an LCS of Xi & Yj

If i or j = 0, then the LCS for that subsequence has length 0 Otherwise, the cost follows directly from Theorem 16.1:

ji

ji

yxji

yxji

ji

jicjic

jicjic

and 0, if

and 0, if

0or 0 if

],1[],1,[max(

1]1,1[

0

],[

Computing the Length of an LCS A recursive algorithm for computing the length of an

LCS of two sequences can be written directly from the recurrence formula for the cost of an optimal solution This recursive algorithm will lead to an exponential-time

solution Dynamic programming techniques can be used to compute

the solution bottom-up and reduce the expected running time

Computing the Length of an LCS The following algorithm fills in the cost table c

based on the input sequences X and Y It also maintains a table b that helps simplify an

optimal solution Entry b[i,j] points to the table entry corresponding to the

optimal subproblem solution chosen when computing c[i,j]

Computing the Length of an LCSvoid LCSLength(const sequence &X, const sequence &Y,

matrix &b, matrix &c)

{

int m = X.length, n = Y.length;

// Initialize tables

for ( int i = 0 ; i < m ; ++i )

c(i,0) = 0;

for ( int j = 0 ; j < m ; ++j )

c(0,j) = 0;

void LCSLength(const sequence &X, const sequence &Y, matrix &b, matrix &c)

{

int m = X.length, n = Y.length;

// Initialize tables

for ( int i = 0 ; i < m ; ++i )

c(i,0) = 0;

for ( int j = 0 ; j < m ; ++j )

c(0,j) = 0;

Computing the Length of an LCS

// Fill in tables

for ( int i = 1 ; i < m ; ++i )

for ( int j = 1 ; j < n ; ++j ) {

if ( x[i] == y[j] ) {

c(i,j) = c(i-1,j-1)+1;

b(i,j) = 1; // Subproblem type 1 “ã”

}

// Fill in tables

for ( int i = 1 ; i < m ; ++i )

for ( int j = 1 ; j < n ; ++j ) {

if ( x[i] == y[j] ) {

c(i,j) = c(i-1,j-1)+1;

b(i,j) = 1; // Subproblem type 1 “ã”

}

Computing the Length of an LCS

else if ( c(i-1,j) >= c(i,j-1) ) {

c(i,j) = c(i-1, j);

b(i,j) = 2; // Subproblem type 2 “á”

}

else {

c(i,j) = c(i, j-1);

b(i,j) = 3; // Subproblem type 3 “ß”

}

}

}

else if ( c(i-1,j) >= c(i,j-1) ) {

c(i,j) = c(i-1, j);

b(i,j) = 2; // Subproblem type 2 “á”

}

else {

c(i,j) = c(i, j-1);

b(i,j) = 3; // Subproblem type 3 “ß”

}

}

}

Constructing an LCS

Table b can now be used to construct the LCS of two sequences Begin in the bottom right corner of b, and follow

the “arrows” This will build the LCS in reverse order

Constructing an LCS

LCSPrint(const matrix &b, const sequence &X, int i, int j)

{

if ( i == 0 || j == 0 )

return;

switch ( b(i,j) ) {

case 1: LCSPrint(b, X, i-1, j-1); break;

case 2: LCSPrint(b, X, i-1, j); break;

case 3: LCSPrint(b, X, i, j-1); break;

}

}

LCSPrint(const matrix &b, const sequence &X, int i, int j)

{

if ( i == 0 || j == 0 )

return;

switch ( b(i,j) ) {

case 1: LCSPrint(b, X, i-1, j-1); break;

case 2: LCSPrint(b, X, i-1, j); break;

case 3: LCSPrint(b, X, i, j-1); break;

}

}

What is the Running Time to Find an LCS? Total running time is now the cost to build the

tables + the cost of printing it out Table building = O(mn) Printing = O(m+n)

So, total cost is O(mn)

chapter 16

Education

optimal solution

optimal value

matrix b

matrices matrix multiplication

matrix ai

resulting matrix

optimal subproblem solutions

cost of computing a1