o'connor -- matrix chain multiplication
DESCRIPTION
This is a Matlab exercise to show that the time required to multiply a chain of matrices A1*A2*...Ak depends on the order in which the pairwise multiplications are performed.TRANSCRIPT
University College, Dublin
Mathematical Physics Department
Master of Computational Science Degree 2004-2005
NUMERICAL ALGORITHMS
Dr Derek O’Connor
Laboratory Exercise No. 6: Complexity of Matrix Operations.
The standard mathematical definition of matrix multiplication, C = AB, where A is m× n, B is n× p, and
C is m × p is as follows :
cij =n
∑
k=1
aik × bkj , i = 1, 2, . . . , m, j = 1, 2, . . . , p. (1)
Do the following
1. Using this definition, derive an expression, Nops(m, n, p), for the number of floating-point additions
and multiplications needed to form the matrix product C = AB,
2. Derive a similar expression Nops(m, n, p, q) for D = ABC, where A is m × n, B is n × p, and C is
p × q , and D is m × q.
Although matrix multiplication is associative, i.e., A(BC) = (AB)C, show that
Nops(A(BC)) 6= Nops((AB)C), in general. (2)
That is, Nops(m, n, p, q) depends on the order in which the product ABC is formed or parenthesized.
3. Write a Matlabfunction C = function MatMult(A,B) that implements the definition above.
Test and compare this function with Matlab’s C = A*B for random square matrices of size
n = 250, 500, 1000.
4. Find 3 sets of values m, n, p, q that demonstrate clearly that the inequality (2) above is true
in general. Use both your function MatMult and Matlab’s C = A*B in this demonstration and
compare the results.
5. Calculate the mflops/sec for each of the tests above. Remember to give the machine parameters
with these rates.
note: When timing the operations above use Matlab’s cputime. Here is the help for this function :
CPUTIME returns the CPU time in seconds that has been used
by the MATLAB process since MATLAB started.
For example:
t=cputime; your_operation; cputime-t
returns the cpu time used to run your_operation.
The return value may overflow the internal representation
and wrap around.
See also ETIME, TIC, TOC, CLOCK
derek o’connor – October 10, 2008.
Lab. Exer. 6. Complexity of Matrix Operations 6.2
Solution Notes for Lab. Exercise No. 6.
Analysis
Standard matrix multiplication, C = AB, where A is m × n, B is n × p, and C is m × p is as follows :
cij =n
∑
k=1
aik × bkj , i = 1, 2, . . . , m, j = 1, 2, . . . , p.
There are m× p elements cij and each requires the summation ai1 × b1j + ai2 × b2j + · · ·+ ain × bnj, whichrequires n mults and n − 1 adds. Hence we get a total of 2mnp− mp = O(mnp) operations.
The matrix triple multiplication operation D = ABC, where A is m × n, B is n × p, and C is p × q, andD is m × q, is defined in terms of the matrix pair multiplication above. This gives two possible orders ofmultipliction :
D1
= (AB)C or D2
= A(BC).
Mathematically, D1
and D2
are identical, but computationally they are not. Using O(mnp) for matrix pairmultiplication we have
Nops(A(BC)) = Nops(R = BC)) + Nops(D = AR) = O(npq) + O(mnq)
Nops((AB)C) = Nops(R = AB)) + Nops(D = RC) = O(mnp) + O(mpq)
We will drop the O(−) formalism and simple say that
Nops(A(BC)) = N1 = npq + mnq and Nops((AB)C) = N2 = mnp + mpq.
It is very difficult to say in general when these two functions have different or equal values. A crude way ofgetting some idea is to run this program
%=============== [kl,ke,kg, Equal] = MNPQ(low, high)===============%
% When do R1 = A*(B*C) and R2 = (A*B)*C have different op counts?
% N1 = npq+mnq N2 = mnp+mpq
% The matrices A,B,C are m*n, n*p, p*q.
%===================================================================%
function [kl,ke,kg, Equal] = MNPQ(low, high)
%===================================================================%
Less = zeros((high-low+1)^4,4); Equal = zeros((high-low+1)^4,4);
Great = zeros((high-low+1)^4,4); kl = 0; ke = 0; kg = 0;
for m = low:high
for n = low:high
for p = low:high
for q = low:high
N1 = n*p*q + m*n*q;
N2 = m*n*p + m*p*q;
if N1 < N2
kl = kl + 1;
Less(kl,:) = [m n p q];
elseif N1 == N2
ke = ke + 1;
Equal(ke,:) = [m n p q];
else
kg = kg + 1;
Great(kg,:) = [m n p q];
end;
end;
end;
end;
end;
%-------------------------- End of MNPQ(low, high) ------------------%
derek o’connor – October 10, 2008.
Lab. Exer. 6. Complexity of Matrix Operations 6.3
Running this program for low = 1 and high = 5 gives a total of 54 = 625 4-tuples (m, n, p, q) of which 290give N
1< N
2, 45 give N
1= N
2, and 290 give N
1> N
2. Here are the 45 for which N
1= N
2.
1 1 1 1 2 1 1 2 3 1 1 3 4 1 1 4 5 1 1 5
1 1 2 2 2 2 1 1 3 2 2 3 4 2 2 4 5 2 2 5
1 1 3 3 2 2 2 2 3 3 1 1 4 3 3 4 5 3 3 5
1 1 4 4 2 2 3 3 3 3 2 2 4 4 1 1 5 4 4 5
1 1 5 5 2 2 4 4 3 3 3 3 4 4 2 2 5 5 1 1
1 2 2 1 2 2 5 5 3 3 4 4 4 4 3 3 5 5 2 2
1 3 3 1 2 3 3 2 3 3 5 5 4 4 4 4 5 5 3 3
1 4 4 1 2 4 4 2 3 4 4 3 4 4 5 5 5 5 4 4
1 5 5 1 2 5 5 2 3 5 5 3 4 5 5 4 5 5 5 5
The main point here is that for most 4−tuples (m, n, p, q) we have N16= N
2, which prompts the question :
How do we decide on A(BC) or (AB)C ? The obvious answer is to calculate N1
and N2
before we do thecomputations.
Calculating Ak. The naıve calculation R = A × (A × (· · · × A(A × A)) · · ·) requires (k − 1)n3 ops. Ifk = 2p there is a better way :
A2 = A × A, A4 = A2 × A2, . . . , A2p
= A2p−1
× A2p−1
,
which requires p multiplications or pn3 = log2k n3 ops. The Matlabprogram is trivial
R = A; for i = 1 : log2(k), R = R ∗ R; end;
The naıve program isR = A; for i = 2 : k, R = A ∗ R; end;
Exercise 6.0.2 : Re-write the first program to handle the general case, i.e., when k is not a power of 2. ⊔⊓
Matrix-Chain Multiplication.
Calculating A1A2 · · ·Ak where Ai is an mi−1 × mi matrix is not an easy extension of the 3-matrix case.
Consider A1A
2A
3A
4. This can be parenthesized in 5 different ways :
((A1A
2)(A
3A
4)) (A
1((A
2A
3)A
4)) (A
1(A
2(A
3A
4))) (((A
1A
2)A
3)A
4)) ((A
1(A
2A
3))A
4).
Let C(k) be the number of ways to parenthesize the matrix-chain A1A
2· · ·Ak. Let us put the first parentheses
between Ai−1and Ai : (A
1A
2. . . Ai−1
)(Ai . . . Ak). There are C(i) ways to parenthesize the left part andC(k − i) ways to parenthesize the right part. Now any parenthesization of the left part may be combinedwith any parenthesization of the right part and so there are C(i)C(k− i) ways of doing this. Now i can haveany value between 1 and k − 1. Hence we must sum C(i)C(k − i) for all i to get
C(k) =
k−1∑
i=1
C(i)C(k − i).
The number of parenthesizations of the matrix chain A1A
2· · ·Ak is the Catalan Number
C(k) =1
k
(
2k − 2
k − 1
)
= Ω(4k/k2).
Catalan Numbers
n 1 2 3 4 5 6 7 8 9 10 15
C(k) 1 1 2 5 14 42 132 429 1430 4862 2674440
derek o’connor – October 10, 2008.
Lab. Exer. 6. Complexity of Matrix Operations 6.4
Consider the following example, taken from Cormen, Leiserson and Rivest, page 307,
Matrix-Chain Multiplication
A1
30 × 35 A2
35 × 15 A3
15 × 5 A4
5 × 10 A5
10 × 20 A6
20 × 25
There are 42 different ways to parenthesize A1A
2· · ·A
6. The optimum can be found by dynamic programming
in O(k3) time. The optimum parenthesization is
((A1 ∗ (A2 ∗ A3)) ∗ ((A4 ∗ A5) ∗ A6)), which gives
((30 × 35 × 5 + (35 × 15 × 5)) + 30 × 5 × 25 + ((5 × 10 × 20) + 5 × 20 × 25)) = 15, 125 ops.
Most compilers/interpreters do not optimally parenthesize and would calculate
(((((A1∗ A
2) ∗ A
3) ∗ A
4) ∗ A
5) ∗ A
6), which requires 40, 500 ops.
Tests, Part 3
The test below gave the following results :
Matlab 6.5 vs MatMult. (P III Xeon 800MHz)
n = 250 n = 500 n = 1000
Matlab 0.046 0.5 3.812
MatMult 1.843 16.235 130.125
ratio mm/ml 40.0 32.5 34.0
%====================== TestMult(A,B)==============%
% Tests and compares Matlabs matrix multiplication
% against the loop implementation of the standard
% definition of matrix multiplication
%==================================================%
function [tmatlab, tmatmult] = TestMult(sizes)
%==================================================%
[m ndims] = size(sizes);
tmatlab = zeros(1,ndims);
tmatmult = zeros(1,ndims);
for n = 1:ndims
A = rand(sizes(n),sizes(n));
B = rand(sizes(n),sizes(n));
C = zeros(sizes(n),sizes(n));
tstart = cputime;
C = A*B;
tmatlab(n) = cputime - tstart;
end;
for n = 1:ndims
A = rand(sizes(n),sizes(n));
B = rand(sizes(n),sizes(n));
C = zeros(sizes(n),sizes(n));
tstart = cputime;
C = MatMult(A,B);
tmatmult(n) = cputime - tstart;
end;
%---------------- End of TestMult ----------------%
Three versions of MatMult were used in these teststo see the effect of vectorization. These are shown inone function below with the variations as comments.
%============= C = MatMult 0,1,2(A,B) ============%
%
% Variations of the original are shown in comments
%
%==================================================%
function C = MatMult(A,B)
%==================================================%
[m,n] = size(A); [p,q] = size(B);
if n ~= p
error(’Matrix sizes incompatible’);
end;
p = q; % makes parameters same as notes.
C = zeros(m,p);
for i = 1:m
for j = 1:p
sum = 0.0;
for k = 1:n
sum = sum + A(i,k)*B(k,j);
end;
C(i,j) = sum;
end;
end;
%------------------ Version 1 --------------------%
% for i = 1:m
% for j = 1:p
% C(i,j) = C(i,j) + A(i,:)*B(:,j);
% end;
% end;
% end;
%------------------ Version 2 --------------------%
% for i = 1:m
% C(i,:) = C(i,:) + A(i,:)*B;
% end;
%---------------- End of MatMult1 ----------------%
derek o’connor – October 10, 2008.
Lab. Exer. 6. Complexity of Matrix Operations 6.5
The tables below give the results :
Matlab 6.5 vs MatMult.
n = 250 n = 500 n = 1000
Matlab 0.062 0.500 3.828
MatMult 0 1.875 16.250 130.270
MatMult 1 2.156 11.343 66.157
MatMult 2 0.313 2.375 18.625
ratio m0/ml 30.242 33.574 34.030
ratio m1/ml 45.872 23.388 17.282
ratio m2/ml 5.048 4.907 4.865
Time in secs. on a Pentium III Xeon 800MHz 640MBRam Dell Precion Workstation 620 Windows 2000
These test results show two things : (i) vectoriza-tion is very important in Matlab, as can be seenin the reduction of the MatMult/Matlabratio from34 to 4.865; (ii) you cannot beat Matlabwhen pro-gramming in Matlab, so we should not be too sur-prised by these ratios.
Matlab6.5 now uses the latest linear algebra ker-nels (Atlas, I believe) which are highly-tuned as-sembly language primitives for matrix-vector andmatrix-matrix multiplication. Inside Matlabwe donot have access to these kernels, except indirectlythough vectorization.
Can Matlabbe beaten? Yes. O-Matrix at e50beats Matlabby a good margin (I bet they usethe same LA kernels). Any properly-written For-tran program compiled with DVF (CVF, H-PVF)and linked with Intel’s latest Math Kernel Library
should come close to or better than Matlab.
Benchmarks on 1000 × 1000 Matrix A
Operations on A Matlab O-Matrix
1000 × 1000 6.5 5.5
Multiply A ∗ A 3.9 3.1
Invert A−1 5.0 3.4
LU Decomp 1.8 1.3
SVD(A) 106.2 70.4
QR(A) 11.9 13.5
Eigen(A) 113.8 60.0
Det(A) 1.7 1.3
Cond(A) 16.3 15.2
Rank(A) 16.3 15.2
Time in secs. Pentium III Xeon 800MHz640MB RAM. Dell Precision Workstation620. Windows 2000
Tests, Part 4
%================ Test3MatChain(dims) =============%
% Tests and compares two different orders of
% multiplying 3 matrices.
% R = ((A*B)*C) and R = (A*(B*C))
% It uses Matlab’s MatMult2’ matrix mults.
%==================================================%
function [tmatlab, tmatmult] = Test3MatChain(sizes)
%==================================================%
tmatlab = zeros(1,3);
tmatmult = zeros(1,3);
A = rand(sizes(1),sizes(2));
B = rand(sizes(2),sizes(3));
C = rand(sizes(3),sizes(4));
R = zeros(sizes(1),sizes(4));
%---------- Do Matlab Test ------------------------%
tstart = cputime;
R = (A*B)*C;
tmatlab(1) = cputime - tstart;
tstart = cputime;
R = A*(B*C);
tmatlab(2) = cputime - tstart;
tmatlab(3) = tmatlab(1)/tmatlab(2);
%---------- Do MatMult2 Test ----------------------%
tstart = cputime;
R = MatMult2(MatMult2(A,B),C);
tmatmult(1) = cputime - tstart;
tstart = cputime;
R = MatMult2(A, MatMult2(B,C));
tmatmult(2) = cputime - tstart;
tmatmult(3) = tmatmult(1)/tmatmult(2);
%---------------- End of Test3MatChain ------------%
Matrix Chain Multiplication
(A*B)*C A*(B*C) ratio
50 250 1000 600
Matlab 0.187 0.610 0.307
MatMult2 0.781 2.766 0.282
50 2 1000 600
Matlab 0.141 0.031 4.548
MatMult2 0.531 0.032 16.594
500 2 1000 600
Matlab 1.188 0.046 25.826
MatMult2 5.375 0.063 85.317
5 2000 1000 60
Matlab 0.109 0.531 0.205
MatMult2 0.188 2.656 0.071
Time in secs. on a Pentium III Xeon 800MHz 640MBRam Dell Precision Workstation 620 Windows 2000
The table above demonstrates the importance oforder in a chain of matrix multiplications. Inciden-tally, a test showed that in Matlab, R = A*B*C isevaluated as R = (A*B)*C. This left-to-right evalu-ation is standard in most languages.
derek o’connor – October 10, 2008.