chapter 16
DESCRIPTION
TRANSCRIPT
Chapter 16
Dynamic Programming
What is Dynamic Programming?
Solves problems by combining solutions to subproblems Similar to divide-and-conquer
Applicable when subproblems are not independent Subproblems share sub-subproblems Common sub-subproblems are computed once, then
answers are reused as needed Typically used to solve optimization problems
What Are Optimization Problems? Problems for which there may be many
solutions Each solution has a value We want the solution that has an “optimal” value
Often the minimum or maximum value There may be more than one “optimal” solution
Development of DP Algorithms
Four steps: Characterize the structure of an optimal solution Recursively define the value of an optimal solution Compute the value of an optimal solution in a bottom-up
fashion Construct an optimal solution from computed information
Matrix-Chain Multiplication
Given a sequence of matrices A1..An, we wish to compute their product
This can be computed by repeated multiplication of matrix pairs using the standard algorithm (next slide) Ambiguities must be resolved in advance by parenthesizing the
matrices Matrix multiplication is associative, so all parenthesizations yield
the same product While order of parenthesizing isn’t mathematically important, it
can make a dramatic difference to the cost of evaluation
Matrix Multiplication Algorithm
Matrix MatrixMultiply(const Matrix &A, const Matrix &B)
{
if ( A.columns() != B.rows() )
throw XIncompatibleDimensions;
Matrix C(A.rows(), B.columns());
for ( int i = 0 ; i < A.rows ; ++i )
{
for ( int j = 0 ; j < B.columns ; ++j )
{
C(i,j) = 0;
for ( int k = 0 ; k < A.columns ; ++k )
C(i,j) += A(i,k)*B(k,j);
}
}
}
Matrix MatrixMultiply(const Matrix &A, const Matrix &B)
{
if ( A.columns() != B.rows() )
throw XIncompatibleDimensions;
Matrix C(A.rows(), B.columns());
for ( int i = 0 ; i < A.rows ; ++i )
{
for ( int j = 0 ; j < B.columns ; ++j )
{
C(i,j) = 0;
for ( int k = 0 ; k < A.columns ; ++k )
C(i,j) += A(i,k)*B(k,j);
}
}
}
Matrix Multiplication Algorithm
To multiply two matrices A (p x q) and B (q x r), the number of columns in A must equal the number of rows in B.
The resulting matrix is p x r Running time of algorithm is O(pqr)
Why Parenthesization Order Matters Consider three matrices: 10x100, 100x5, and
5x50 If we multiply (A1A2)A3, we perform 10x100x5 +
10x5x50 = 7,500 multiplications If we multiply A1(A2A3), we perform 100x5x50 +
10x100x50 = 75,000 multiplications
The Matrix-Chain Multiplication Problem Given a chain <A1,A2,…,An> of n matrices,
where for i = 1, 2, …, n, the matrix Ai has dimension pi-1xpi, fully parenthesize the product A1A2…An in a way that minimizes the number of scalar multiplications
Structure of an Optimal Parenthesization To compute the product of all matrices (denoted as
A1..n), we must first: Compute products A1..k and Ak+1..n
Multiply them together
The cost of computing A1..n is then just the cost of computing A1..k, plus the cost of computing Ak+1..n, plus the cost of multiplying them together
Structure of an Optimal Parenthesization The parenthesization of the “prefix” subchain A1,…,Ak and “suffix”
subchain Ak+1,…,An within the optimal parenthesization of A1,…,An must be optimal If it weren’t, then there would be some other parenthesization
order with lower cost, which would result in non-optimal parenthesization of A1,…,An
Optimal solution to the whole problem therefore contains optimal subproblem solutions This is a hallmark of dynamic programming
A Recursive Solution
Our subproblem is the problem of determining the minimum cost of a parenthesization of Ai,…,Aj, where 1 <= i <= j <= n
Let m[i,j] be the minimum number of multiplications needed to compute Ai..j
Lowest cost to compute A1..n is m[1,n]
A Recursive Solution
How do we recursively define m[i,j]? If i = j, no multiplications are necessary since we never
have to multiply a matrix by itself Thus, m[i,i] = 0 for all i
If i < j, we compute as follows: Assume that there is some optimal parenthesization that splits
the range between k and k+1 m[i,j] = m[i,k]+m[k+1,j]+pi-1pkpj, per our previous discussion
To do this, we must find k There are j-i possibilities, which can be checked directly
A Recursive Solution
Our recursive solution is now:
m[i,j] gives the costs of optimal solutions to subproblems
Define a second table s[i,j] to help keep track of how to construct an optimal solution Each entry contains the value k at which to split the
product
ji
jipppjkmkimjimjki
jki if
if}],1[],[{min
0],[
1
Computing the Optimal Costs
An algorithm to compute the minimum cost for a matrix-chain multiplication should now be simple It will take exponential time, which is no better
than our brute force approach So how can we improve the running time?
Computing the Optimal Costs
There are only O(n2) total subproblems One subproblem for each choice of i and j
satisfying 1 <= i <= j <= n A recursive algorithm may encounter each
subproblem many times Recomputations of known values is costly This is where dynamic programming techniques
are superior
Computing the Optimal Costs
Instead of using recursion, we will use a the dynamic programming bottom-up approach to compute the cost
The following algorithm assumes that the matrix Ai has dimension pi-1xpi
The input sequence is an array <p0, p1, …, pn> with length n+1
The output is the matrices m and s
MatrixChainOrder(const double p[], int size, Matrix &m, Matrix &s)
{
int n = size-1;
for ( int i = 1 ; i <= n ; ++i )
m(i,i) = 0;
for ( int L = 2 ; L <= n ; ++L ) {
for ( int i = 1 ; i <= n-L+1 ; ++i ) {
for ( int j = 0 ; j <= i+L-1 ; ++j ) {
m(i,j) = MAXDOUBLE;
for ( int k = i ; k <= j-1 ; ++k ) {
int q = m(i,k)+m(k+1,j)+p[i-1]*p[k]*p[j];
if ( q < m(i,j) ) m(i,j) = q;
else s(i,j) = k;
} // for ( k )
} // for ( j )
} // for ( i )
} // for ( L )
}
MatrixChainOrder(const double p[], int size, Matrix &m, Matrix &s)
{
int n = size-1;
for ( int i = 1 ; i <= n ; ++i )
m(i,i) = 0;
for ( int L = 2 ; L <= n ; ++L ) {
for ( int i = 1 ; i <= n-L+1 ; ++i ) {
for ( int j = 0 ; j <= i+L-1 ; ++j ) {
m(i,j) = MAXDOUBLE;
for ( int k = i ; k <= j-1 ; ++k ) {
int q = m(i,k)+m(k+1,j)+p[i-1]*p[k]*p[j];
if ( q < m(i,j) ) m(i,j) = q;
else s(i,j) = k;
} // for ( k )
} // for ( j )
} // for ( i )
} // for ( L )
}
Example: m and s
0 15,750 7,875 9.375 11,875 15.1250 2,625 4,375 7,125 10,500
0 750 2,500 5,3750 1,000 3,500
0 5,0000
Matrix m
1 1 3 3 32 3 3 3
3 3 34 5
5
Matrix s
Input dimensions: 30x35, 35x15, 15x5, 5x10, 10x20, 20x25
Computing the Optimal Costs
The matrix m contains the costs of multiplications, and s contains which index of k was used to achieve the optimal cost
What is the running time? Three nested loops = O(n3) This is better than the exponential running time
the recurrence would give us
Constructing an Optimal Solution So far, we only know the optimal number of scalar
multiplications, not the order in which to multiply the matrices This information is encoded in the table s
Each entry s[i,j] records the value k such that the optimal parenthesization of Ai…Aj occurs between matrix k and k+1
To compute the product A1..n, we parenthesize at s[1,n] Previous matrix multiplications can be computed recursively
E.g., s[1,s[1,n]] contains the optimal split for the left half of the multiplication
Constructing an Optimal SolutionMatrixChainMultiply(const Matrix A[], const Matrix &s, int
i, int j)
{
if ( j > i )
{
Matrix X = MatrixChainMultiply(A, s, i, s(i,j));
Matrix Y = MatrixChainMultiply(A, s, s(i,j)+1, j);
return MatrixMultiply(X,Y);
}
else
return A[i];
}
MatrixChainMultiply(const Matrix A[], const Matrix &s, int i, int j)
{
if ( j > i )
{
Matrix X = MatrixChainMultiply(A, s, i, s(i,j));
Matrix Y = MatrixChainMultiply(A, s, s(i,j)+1, j);
return MatrixMultiply(X,Y);
}
else
return A[i];
}
Elements of Dynamic Programming There are two key ingredients that an
optimization problem must have for dynamic programming to be applicable: Optimal substructure Overlapping subproblems
Optimal Substructure
A problem that exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems E.g., matrix-chain multiplication exhibited this property
An optimal parenthesization of a matrix chain requires that each sub-chain also be optimally parenthesized
This property is typically shown by assuming that a better solution exists, then showing how this contradicts the optimality of the solution to the original problem
Overlapping Subproblems
The “space” for subproblems must be relatively small i.e., a recursive algorithm for the solution would end up
solving the same subproblems over and over This is called overlapping subproblems
Dynamic programming algorithms typically compute overlapping subproblems once and store the solution in a table for later (constant-time) lookup
Overlapping Subproblems
From the matrix-chain algorithm, we see earlier computations being reused to perform later computations:
What if we replaced this with a recursive algorithm? Figure 16.2 on page 311 shows the added
computations
int q = m(i,k)+m(k+1,j)+p[i-1]*p[k]*p[j];if ( q < m(i,j) )
m(i,j) = q;else
s(i,j) = k;
Exercise
A common recursive algorithm for computing the Fibonacci sequence is:
2 if
2 if
1 if
)2()1(
1
1
)(
x
x
x
xFibxFib
xFib
Is dynamic programming applicable? Why? Write a dynamic programming algorithm for
solving this problem
Longest Common Subsequence
A subsequence is simply a part of a sequence, consisting of some number of consecutive elements
Formally: If X is a sequence of size m, and Z is a sequence
of size k, Z is a subsequence of X if there exists some j such that for 1<= i <= k, we have xi+j=zi
Longest Common Subsequence
Given two sequences X and Y, Z is a common subsequence of X and Y if Z is a subsequence of both X and Y
The longest common subsequence problem is thus the problem of finding the longest common subsequence of two given sequences
Brute-Force Approach
Enumerate all subsequences in X to see if they are also subsequences of Y, and keep track of the longest one found
If X is of size m, that’s 2m possibilities!
Characterizing an LCS
Does the LCS problem exhibit an optimal substructure property? Yes, corresponding to pairs of “prefixes”
A prefix of a sequence is simply the beginning portion of a sequence for some specified length e.g., if X = <A,B,C,B,D,A,B>, the fourth prefix of X
(X4) is <A,B,C,B>
Characterizing an LCS
From Theorem 16.1: Let X=<x1,…,xm>, Y=<y1,…,yn>, and Z=<z1,…,zk> be any LCS of X and Y If xm=yn, then zk=xm=yn and Zk-1 is an LCS of Xm-1 and Yn-1
If xm!=yn, then zk!=xm implies that Z is an LCS of Xm-1 and Y
If xm!=yn, then zk!=yn implies that Z is an LCS of X and Yn-1
Characterizing an LCS
What does this mean? An LCS of two sequences contains within it an
LCS of prefixes of the two sequences This is just the optimal substructure property
A Recursive Solution To Subproblems Theorem 16.1 gives us two conditions to check for:
If xm = yn, we need to find an LCS of Xm-1 & Yn-1, to which we append xm = yn
If xm != yn, then we must solve two subproblems: finding an LCS of Xm & Yn-1, and an LCS of Xm-1 & Yn
Whichever is longer is the LCS of X & Y Overlapping subproblems are evident
To find an LCS of X & Y, we must find LCS of Xm-1 and/or Yn-1, which have still smaller overlapping subproblems
A Recursive Solution To Subproblems What is the cost of an optimal solution?
Let c[i,j] be the length of an LCS of Xi & Yj
If i or j = 0, then the LCS for that subsequence has length 0 Otherwise, the cost follows directly from Theorem 16.1:
ji
ji
yxji
yxji
ji
jicjic
jicjic
and 0, if
and 0, if
0or 0 if
],1[],1,[max(
1]1,1[
0
],[
Computing the Length of an LCS A recursive algorithm for computing the length of an
LCS of two sequences can be written directly from the recurrence formula for the cost of an optimal solution This recursive algorithm will lead to an exponential-time
solution Dynamic programming techniques can be used to compute
the solution bottom-up and reduce the expected running time
Computing the Length of an LCS The following algorithm fills in the cost table c
based on the input sequences X and Y It also maintains a table b that helps simplify an
optimal solution Entry b[i,j] points to the table entry corresponding to the
optimal subproblem solution chosen when computing c[i,j]
Computing the Length of an LCSvoid LCSLength(const sequence &X, const sequence &Y,
matrix &b, matrix &c)
{
int m = X.length, n = Y.length;
// Initialize tables
for ( int i = 0 ; i < m ; ++i )
c(i,0) = 0;
for ( int j = 0 ; j < m ; ++j )
c(0,j) = 0;
void LCSLength(const sequence &X, const sequence &Y, matrix &b, matrix &c)
{
int m = X.length, n = Y.length;
// Initialize tables
for ( int i = 0 ; i < m ; ++i )
c(i,0) = 0;
for ( int j = 0 ; j < m ; ++j )
c(0,j) = 0;
Computing the Length of an LCS
// Fill in tables
for ( int i = 1 ; i < m ; ++i )
for ( int j = 1 ; j < n ; ++j ) {
if ( x[i] == y[j] ) {
c(i,j) = c(i-1,j-1)+1;
b(i,j) = 1; // Subproblem type 1 “ã”
}
// Fill in tables
for ( int i = 1 ; i < m ; ++i )
for ( int j = 1 ; j < n ; ++j ) {
if ( x[i] == y[j] ) {
c(i,j) = c(i-1,j-1)+1;
b(i,j) = 1; // Subproblem type 1 “ã”
}
Computing the Length of an LCS
else if ( c(i-1,j) >= c(i,j-1) ) {
c(i,j) = c(i-1, j);
b(i,j) = 2; // Subproblem type 2 “á”
}
else {
c(i,j) = c(i, j-1);
b(i,j) = 3; // Subproblem type 3 “ß”
}
}
}
else if ( c(i-1,j) >= c(i,j-1) ) {
c(i,j) = c(i-1, j);
b(i,j) = 2; // Subproblem type 2 “á”
}
else {
c(i,j) = c(i, j-1);
b(i,j) = 3; // Subproblem type 3 “ß”
}
}
}
Constructing an LCS
Table b can now be used to construct the LCS of two sequences Begin in the bottom right corner of b, and follow
the “arrows” This will build the LCS in reverse order
Constructing an LCS
LCSPrint(const matrix &b, const sequence &X, int i, int j)
{
if ( i == 0 || j == 0 )
return;
switch ( b(i,j) ) {
case 1: LCSPrint(b, X, i-1, j-1); break;
case 2: LCSPrint(b, X, i-1, j); break;
case 3: LCSPrint(b, X, i, j-1); break;
}
}
LCSPrint(const matrix &b, const sequence &X, int i, int j)
{
if ( i == 0 || j == 0 )
return;
switch ( b(i,j) ) {
case 1: LCSPrint(b, X, i-1, j-1); break;
case 2: LCSPrint(b, X, i-1, j); break;
case 3: LCSPrint(b, X, i, j-1); break;
}
}
What is the Running Time to Find an LCS? Total running time is now the cost to build the
tables + the cost of printing it out Table building = O(mn) Printing = O(m+n)
So, total cost is O(mn)