o'connor -- matrix chain multiplication

5
University College, Dublin Mathematical Physics Department Master of Computational Science Degree 2004-2005 NUMERICAL ALGORITHMS Dr Derek O’Connor Laboratory Exercise No. 6: Complexity of Matrix Operations. The standard mathematical definition of matrix multiplication, C = AB, where A is m × n, B is n × p, and C is m × p is as follows : c ij = n k=1 a ik × b kj , i =1, 2,...,m, j =1, 2,...,p. (1) Do the following 1. Using this definition, derive an expression, N ops (m, n, p), for the number of floating-point additions and multiplications needed to form the matrix product C = AB, 2. Derive a similar expression N ops (m, n, p, q) for D = ABC, where A is m × n, B is n × p, and C is p × q , and D is m × q. Although matrix multiplication is associative, i.e., A(BC)=(AB)C, show that N ops (A(BC)) = N ops ((AB)C), in general. (2) That is, N ops (m, n, p, q) depends on the order in which the product ABC is formed or parenthesized. 3. Write a Matlabfunction C = function MatMult(A,B) that implements the definition above. Test and compare this function with Matlab’s C = A*B for random square matrices of size n = 250, 500, 1000. 4. Find 3 sets of values {m, n, p, q} that demonstrate clearly that the inequality (2) above is true in general. Use both your function MatMult and Matlab’s C = A*B in this demonstration and compare the results. 5. Calculate the mflops/sec for each of the tests above. Remember to give the machine parameters with these rates. note: When timing the operations above use Matlab’s cputime. Here is the help for this function : CPUTIME returns the CPU time in seconds that has been used by the MATLAB process since MATLAB started. For example: t=cputime; your_operation; cputime-t returns the cpu time used to run your_operation. The return value may overflow the internal representation and wrap around. See also ETIME, TIC, TOC, CLOCK derek o’connor – October 10, 2008.

Upload: derek-oconnor

Post on 10-Apr-2015

1.108 views

Category:

Documents


5 download

DESCRIPTION

This is a Matlab exercise to show that the time required to multiply a chain of matrices A1*A2*...Ak depends on the order in which the pairwise multiplications are performed.

TRANSCRIPT

Page 1: O'Connor -- Matrix Chain Multiplication

University College, Dublin

Mathematical Physics Department

Master of Computational Science Degree 2004-2005

NUMERICAL ALGORITHMS

Dr Derek O’Connor

Laboratory Exercise No. 6: Complexity of Matrix Operations.

The standard mathematical definition of matrix multiplication, C = AB, where A is m× n, B is n× p, and

C is m × p is as follows :

cij =n

k=1

aik × bkj , i = 1, 2, . . . , m, j = 1, 2, . . . , p. (1)

Do the following

1. Using this definition, derive an expression, Nops(m, n, p), for the number of floating-point additions

and multiplications needed to form the matrix product C = AB,

2. Derive a similar expression Nops(m, n, p, q) for D = ABC, where A is m × n, B is n × p, and C is

p × q , and D is m × q.

Although matrix multiplication is associative, i.e., A(BC) = (AB)C, show that

Nops(A(BC)) 6= Nops((AB)C), in general. (2)

That is, Nops(m, n, p, q) depends on the order in which the product ABC is formed or parenthesized.

3. Write a Matlabfunction C = function MatMult(A,B) that implements the definition above.

Test and compare this function with Matlab’s C = A*B for random square matrices of size

n = 250, 500, 1000.

4. Find 3 sets of values m, n, p, q that demonstrate clearly that the inequality (2) above is true

in general. Use both your function MatMult and Matlab’s C = A*B in this demonstration and

compare the results.

5. Calculate the mflops/sec for each of the tests above. Remember to give the machine parameters

with these rates.

note: When timing the operations above use Matlab’s cputime. Here is the help for this function :

CPUTIME returns the CPU time in seconds that has been used

by the MATLAB process since MATLAB started.

For example:

t=cputime; your_operation; cputime-t

returns the cpu time used to run your_operation.

The return value may overflow the internal representation

and wrap around.

See also ETIME, TIC, TOC, CLOCK

derek o’connor – October 10, 2008.

Page 2: O'Connor -- Matrix Chain Multiplication

Lab. Exer. 6. Complexity of Matrix Operations 6.2

Solution Notes for Lab. Exercise No. 6.

Analysis

Standard matrix multiplication, C = AB, where A is m × n, B is n × p, and C is m × p is as follows :

cij =n

k=1

aik × bkj , i = 1, 2, . . . , m, j = 1, 2, . . . , p.

There are m× p elements cij and each requires the summation ai1 × b1j + ai2 × b2j + · · ·+ ain × bnj, whichrequires n mults and n − 1 adds. Hence we get a total of 2mnp− mp = O(mnp) operations.

The matrix triple multiplication operation D = ABC, where A is m × n, B is n × p, and C is p × q, andD is m × q, is defined in terms of the matrix pair multiplication above. This gives two possible orders ofmultipliction :

D1

= (AB)C or D2

= A(BC).

Mathematically, D1

and D2

are identical, but computationally they are not. Using O(mnp) for matrix pairmultiplication we have

Nops(A(BC)) = Nops(R = BC)) + Nops(D = AR) = O(npq) + O(mnq)

Nops((AB)C) = Nops(R = AB)) + Nops(D = RC) = O(mnp) + O(mpq)

We will drop the O(−) formalism and simple say that

Nops(A(BC)) = N1 = npq + mnq and Nops((AB)C) = N2 = mnp + mpq.

It is very difficult to say in general when these two functions have different or equal values. A crude way ofgetting some idea is to run this program

%=============== [kl,ke,kg, Equal] = MNPQ(low, high)===============%

% When do R1 = A*(B*C) and R2 = (A*B)*C have different op counts?

% N1 = npq+mnq N2 = mnp+mpq

% The matrices A,B,C are m*n, n*p, p*q.

%===================================================================%

function [kl,ke,kg, Equal] = MNPQ(low, high)

%===================================================================%

Less = zeros((high-low+1)^4,4); Equal = zeros((high-low+1)^4,4);

Great = zeros((high-low+1)^4,4); kl = 0; ke = 0; kg = 0;

for m = low:high

for n = low:high

for p = low:high

for q = low:high

N1 = n*p*q + m*n*q;

N2 = m*n*p + m*p*q;

if N1 < N2

kl = kl + 1;

Less(kl,:) = [m n p q];

elseif N1 == N2

ke = ke + 1;

Equal(ke,:) = [m n p q];

else

kg = kg + 1;

Great(kg,:) = [m n p q];

end;

end;

end;

end;

end;

%-------------------------- End of MNPQ(low, high) ------------------%

derek o’connor – October 10, 2008.

Page 3: O'Connor -- Matrix Chain Multiplication

Lab. Exer. 6. Complexity of Matrix Operations 6.3

Running this program for low = 1 and high = 5 gives a total of 54 = 625 4-tuples (m, n, p, q) of which 290give N

1< N

2, 45 give N

1= N

2, and 290 give N

1> N

2. Here are the 45 for which N

1= N

2.

1 1 1 1 2 1 1 2 3 1 1 3 4 1 1 4 5 1 1 5

1 1 2 2 2 2 1 1 3 2 2 3 4 2 2 4 5 2 2 5

1 1 3 3 2 2 2 2 3 3 1 1 4 3 3 4 5 3 3 5

1 1 4 4 2 2 3 3 3 3 2 2 4 4 1 1 5 4 4 5

1 1 5 5 2 2 4 4 3 3 3 3 4 4 2 2 5 5 1 1

1 2 2 1 2 2 5 5 3 3 4 4 4 4 3 3 5 5 2 2

1 3 3 1 2 3 3 2 3 3 5 5 4 4 4 4 5 5 3 3

1 4 4 1 2 4 4 2 3 4 4 3 4 4 5 5 5 5 4 4

1 5 5 1 2 5 5 2 3 5 5 3 4 5 5 4 5 5 5 5

The main point here is that for most 4−tuples (m, n, p, q) we have N16= N

2, which prompts the question :

How do we decide on A(BC) or (AB)C ? The obvious answer is to calculate N1

and N2

before we do thecomputations.

Calculating Ak. The naıve calculation R = A × (A × (· · · × A(A × A)) · · ·) requires (k − 1)n3 ops. Ifk = 2p there is a better way :

A2 = A × A, A4 = A2 × A2, . . . , A2p

= A2p−1

× A2p−1

,

which requires p multiplications or pn3 = log2k n3 ops. The Matlabprogram is trivial

R = A; for i = 1 : log2(k), R = R ∗ R; end;

The naıve program isR = A; for i = 2 : k, R = A ∗ R; end;

Exercise 6.0.2 : Re-write the first program to handle the general case, i.e., when k is not a power of 2. ⊔⊓

Matrix-Chain Multiplication.

Calculating A1A2 · · ·Ak where Ai is an mi−1 × mi matrix is not an easy extension of the 3-matrix case.

Consider A1A

2A

3A

4. This can be parenthesized in 5 different ways :

((A1A

2)(A

3A

4)) (A

1((A

2A

3)A

4)) (A

1(A

2(A

3A

4))) (((A

1A

2)A

3)A

4)) ((A

1(A

2A

3))A

4).

Let C(k) be the number of ways to parenthesize the matrix-chain A1A

2· · ·Ak. Let us put the first parentheses

between Ai−1and Ai : (A

1A

2. . . Ai−1

)(Ai . . . Ak). There are C(i) ways to parenthesize the left part andC(k − i) ways to parenthesize the right part. Now any parenthesization of the left part may be combinedwith any parenthesization of the right part and so there are C(i)C(k− i) ways of doing this. Now i can haveany value between 1 and k − 1. Hence we must sum C(i)C(k − i) for all i to get

C(k) =

k−1∑

i=1

C(i)C(k − i).

The number of parenthesizations of the matrix chain A1A

2· · ·Ak is the Catalan Number

C(k) =1

k

(

2k − 2

k − 1

)

= Ω(4k/k2).

Catalan Numbers

n 1 2 3 4 5 6 7 8 9 10 15

C(k) 1 1 2 5 14 42 132 429 1430 4862 2674440

derek o’connor – October 10, 2008.

Page 4: O'Connor -- Matrix Chain Multiplication

Lab. Exer. 6. Complexity of Matrix Operations 6.4

Consider the following example, taken from Cormen, Leiserson and Rivest, page 307,

Matrix-Chain Multiplication

A1

30 × 35 A2

35 × 15 A3

15 × 5 A4

5 × 10 A5

10 × 20 A6

20 × 25

There are 42 different ways to parenthesize A1A

2· · ·A

6. The optimum can be found by dynamic programming

in O(k3) time. The optimum parenthesization is

((A1 ∗ (A2 ∗ A3)) ∗ ((A4 ∗ A5) ∗ A6)), which gives

((30 × 35 × 5 + (35 × 15 × 5)) + 30 × 5 × 25 + ((5 × 10 × 20) + 5 × 20 × 25)) = 15, 125 ops.

Most compilers/interpreters do not optimally parenthesize and would calculate

(((((A1∗ A

2) ∗ A

3) ∗ A

4) ∗ A

5) ∗ A

6), which requires 40, 500 ops.

Tests, Part 3

The test below gave the following results :

Matlab 6.5 vs MatMult. (P III Xeon 800MHz)

n = 250 n = 500 n = 1000

Matlab 0.046 0.5 3.812

MatMult 1.843 16.235 130.125

ratio mm/ml 40.0 32.5 34.0

%====================== TestMult(A,B)==============%

% Tests and compares Matlabs matrix multiplication

% against the loop implementation of the standard

% definition of matrix multiplication

%==================================================%

function [tmatlab, tmatmult] = TestMult(sizes)

%==================================================%

[m ndims] = size(sizes);

tmatlab = zeros(1,ndims);

tmatmult = zeros(1,ndims);

for n = 1:ndims

A = rand(sizes(n),sizes(n));

B = rand(sizes(n),sizes(n));

C = zeros(sizes(n),sizes(n));

tstart = cputime;

C = A*B;

tmatlab(n) = cputime - tstart;

end;

for n = 1:ndims

A = rand(sizes(n),sizes(n));

B = rand(sizes(n),sizes(n));

C = zeros(sizes(n),sizes(n));

tstart = cputime;

C = MatMult(A,B);

tmatmult(n) = cputime - tstart;

end;

%---------------- End of TestMult ----------------%

Three versions of MatMult were used in these teststo see the effect of vectorization. These are shown inone function below with the variations as comments.

%============= C = MatMult 0,1,2(A,B) ============%

%

% Variations of the original are shown in comments

%

%==================================================%

function C = MatMult(A,B)

%==================================================%

[m,n] = size(A); [p,q] = size(B);

if n ~= p

error(’Matrix sizes incompatible’);

end;

p = q; % makes parameters same as notes.

C = zeros(m,p);

for i = 1:m

for j = 1:p

sum = 0.0;

for k = 1:n

sum = sum + A(i,k)*B(k,j);

end;

C(i,j) = sum;

end;

end;

%------------------ Version 1 --------------------%

% for i = 1:m

% for j = 1:p

% C(i,j) = C(i,j) + A(i,:)*B(:,j);

% end;

% end;

% end;

%------------------ Version 2 --------------------%

% for i = 1:m

% C(i,:) = C(i,:) + A(i,:)*B;

% end;

%---------------- End of MatMult1 ----------------%

derek o’connor – October 10, 2008.

Page 5: O'Connor -- Matrix Chain Multiplication

Lab. Exer. 6. Complexity of Matrix Operations 6.5

The tables below give the results :

Matlab 6.5 vs MatMult.

n = 250 n = 500 n = 1000

Matlab 0.062 0.500 3.828

MatMult 0 1.875 16.250 130.270

MatMult 1 2.156 11.343 66.157

MatMult 2 0.313 2.375 18.625

ratio m0/ml 30.242 33.574 34.030

ratio m1/ml 45.872 23.388 17.282

ratio m2/ml 5.048 4.907 4.865

Time in secs. on a Pentium III Xeon 800MHz 640MBRam Dell Precion Workstation 620 Windows 2000

These test results show two things : (i) vectoriza-tion is very important in Matlab, as can be seenin the reduction of the MatMult/Matlabratio from34 to 4.865; (ii) you cannot beat Matlabwhen pro-gramming in Matlab, so we should not be too sur-prised by these ratios.

Matlab6.5 now uses the latest linear algebra ker-nels (Atlas, I believe) which are highly-tuned as-sembly language primitives for matrix-vector andmatrix-matrix multiplication. Inside Matlabwe donot have access to these kernels, except indirectlythough vectorization.

Can Matlabbe beaten? Yes. O-Matrix at e50beats Matlabby a good margin (I bet they usethe same LA kernels). Any properly-written For-tran program compiled with DVF (CVF, H-PVF)and linked with Intel’s latest Math Kernel Library

should come close to or better than Matlab.

Benchmarks on 1000 × 1000 Matrix A

Operations on A Matlab O-Matrix

1000 × 1000 6.5 5.5

Multiply A ∗ A 3.9 3.1

Invert A−1 5.0 3.4

LU Decomp 1.8 1.3

SVD(A) 106.2 70.4

QR(A) 11.9 13.5

Eigen(A) 113.8 60.0

Det(A) 1.7 1.3

Cond(A) 16.3 15.2

Rank(A) 16.3 15.2

Time in secs. Pentium III Xeon 800MHz640MB RAM. Dell Precision Workstation620. Windows 2000

Tests, Part 4

%================ Test3MatChain(dims) =============%

% Tests and compares two different orders of

% multiplying 3 matrices.

% R = ((A*B)*C) and R = (A*(B*C))

% It uses Matlab’s MatMult2’ matrix mults.

%==================================================%

function [tmatlab, tmatmult] = Test3MatChain(sizes)

%==================================================%

tmatlab = zeros(1,3);

tmatmult = zeros(1,3);

A = rand(sizes(1),sizes(2));

B = rand(sizes(2),sizes(3));

C = rand(sizes(3),sizes(4));

R = zeros(sizes(1),sizes(4));

%---------- Do Matlab Test ------------------------%

tstart = cputime;

R = (A*B)*C;

tmatlab(1) = cputime - tstart;

tstart = cputime;

R = A*(B*C);

tmatlab(2) = cputime - tstart;

tmatlab(3) = tmatlab(1)/tmatlab(2);

%---------- Do MatMult2 Test ----------------------%

tstart = cputime;

R = MatMult2(MatMult2(A,B),C);

tmatmult(1) = cputime - tstart;

tstart = cputime;

R = MatMult2(A, MatMult2(B,C));

tmatmult(2) = cputime - tstart;

tmatmult(3) = tmatmult(1)/tmatmult(2);

%---------------- End of Test3MatChain ------------%

Matrix Chain Multiplication

(A*B)*C A*(B*C) ratio

50 250 1000 600

Matlab 0.187 0.610 0.307

MatMult2 0.781 2.766 0.282

50 2 1000 600

Matlab 0.141 0.031 4.548

MatMult2 0.531 0.032 16.594

500 2 1000 600

Matlab 1.188 0.046 25.826

MatMult2 5.375 0.063 85.317

5 2000 1000 60

Matlab 0.109 0.531 0.205

MatMult2 0.188 2.656 0.071

Time in secs. on a Pentium III Xeon 800MHz 640MBRam Dell Precision Workstation 620 Windows 2000

The table above demonstrates the importance oforder in a chain of matrix multiplications. Inciden-tally, a test showed that in Matlab, R = A*B*C isevaluated as R = (A*B)*C. This left-to-right evalu-ation is standard in most languages.

derek o’connor – October 10, 2008.