interfacing matlab with a parallel virtual processor for matrix algorithms

Interfacing MATLAB with a parallel virtual processor for matrixalgorithms

Joseph P. Ho�beck *, Mansoor Sarwar, Eric J. Rix

Department of Electrical Engineering and Computer Science, School of Engineering, University of Portland, 5000 N. Williamette Blvd.,

Portland, OR 97203-5798, USA

Received 21 October 1999; received in revised form 10 January 2000; accepted 28 February 2000

Abstract

This paper describes the results of a project to interface MATLAB with a parallel virtual processor (PVP) that allows execution

of matrix operations in MATLAB on a set of computers connected by a network. The software, a connection-oriented BSD socket-

based client±server model, automatically partitions a MATLAB problem and delegates work to server processes running on sep-

arate remote machines. Experimental data on the matrix multiply operation shows that the speed improvement of the parallel

implementation over the single-processor MATLAB algorithm depends on the size of the matrices, the number of processes, the

speed of the processors, and the speed of the network connection. In particular, the advantage of doing matrix multiply in parallel

increases as the size of the matrices increase. A speedup of 2.95 times was achieved in multiplying 2048 by 2048 square matrices using

15 workstations. The algorithm was also implemented on a network of four PC's, which was up to 2.5 times as fast as four

workstations. The study also showed that there is an optimal number of processes for a particular problem, and so using more

processes is not always faster. Ó 2001 Elsevier Science Inc. All rights reserved.

Keywords: Parallel algorithms; Matrix algorithms; MATLAB

1. Background

Most companies and universities, and lately manyhomes, have more than one computer connected by anetwork. Often these computers are not used at allduring o�-hours and under-utilized during workinghours. Furthermore, most common tasks like readingemail, sur®ng the web, or word processing, use only afraction of the available processing power even while thecomputer is being used. If this available processingpower could be utilized conveniently, it would give anorganization a source of processing power that is es-sentially free. Furthermore, this arrangement usescommodity o�-the-shelf computers that are cheap, canbe incrementally upgraded, and the hardware and soft-ware are available from a variety of vendors.

The problem, of course, is that many tasks do noteasily break down into parts that can be done in parallel,and even for problems that can be executed in parallel, it

is often necessary for the user to modify his/her code totake advantage of a parallel implementation.

However, the MATLAB programming language,which is used in many universities and companies toperform engineering and scienti®c calculations, is highlyvectorized in that it handles data in vectors or matricesas a whole. To multiply two matrices A and B and putthe result in a variable called C, for example, the fol-lowing statement is used: C � A � B. Programs writtenin MATLAB usually take advantage of the vectoriza-tion to reduce the number of loops, which increases thespeed of execution. The vectorized nature of MATLABlends itself to a parallel implementation very naturallysince the fundamental operations are vector and matrixoperations. Thus MATLAB, as a popular computerlanguage, can provide a natural and intuitive interfaceto a distributed processing system where the user doesnot need to be concerned with the details of the dis-tributed system, and this interface can be used tospeedup execution of MATLAB programs in a mannerthat is transparent to the user.

The goal of this study, then, is to produce a parallelvirtual processor (PVP) and an interface betweenMATLAB and the PVP that allows MATLAB

The Journal of Systems and Software 56 (2001) 77±80www.elsevier.com/locate/jss

* Corresponding author. Tel.: +1-503-943-7428; fax: +1-503-943-

7316.

E-mail address: ho�[email protected] (J.P. Ho�beck).

0164-1212/01/$ - see front matter Ó 2001 Elsevier Science Inc. All rights reserved.

PII: S 0 1 6 4 - 1 2 1 2 ( 0 0 ) 0 0 0 8 7 - X

programs to be executed in parallel on many computersconnected to a network in a manner that is transparentto the end user. This interface should allow users toconveniently take advantage of existing computers on anetwork that are under-utilized to perform calculationsand to speedup the execution of their MATLAB pro-grams.

2. Software architecture and interface

The architecture of the software, shown in Fig. 1, is aconnection-oriented BSD socket-based client±servermodel (Comer and Stevens, 1994). It consists of MAT-LAB, MATLAB's application programming interface(API) which allows MATLAB to interface with pro-grams written in other computer languages (MathWorksInc., 1998), an interface program written in C languagewhich automatically partitions the problem and collectsthe results, BSD sockets (Stevens, 1999) which facilitatethe communication between the interface software (theclient), and server software that performs the actualcomputations on a number of separate computers.

While running MATLAB on the client computer, thesequence of events begin when the user (or a program)invokes a MATLAB command that has a parallel im-plementation, such as matrix multiplication in our case.Instead of executing the built-in MATLAB command,which would execute only on the client computer,MATLAB's application programming interface is em-ployed to call the interface program, which also runs onthe client machine. The interface program then probesthe network to identify machines that can function asservers and divides the matrix multiplication problem asevenly as possible among the available servers. For ex-ample, if there are P servers available to multiply two N

by N matrices, then, in our current implementation,each server would be sent N=P rows of the ®rst matrixand the entire second matrix. The data transfer is im-plemented as a connection-oriented client±server modelusing the BSD socket interface. After the interfaceprogram completes the data transfers, it waits for theresults from the servers, beginning with the server thatwas ®rst to receive data.

Meanwhile, as each server receives its N=P by Nmatrix and the N by N matrix, it performs a matrixmultiply resulting in an N=P by N matrix. All P serverscan perform their matrix multiplications simultaneouslyand independently. When each server completes itsmatrix multiplication, it returns the result to the client.The interface program running on the client machine,then collects the N=P by P matrix from each serverwhich represents N=P rows of the ®nal result, assemblesthem in the proper order in memory, and returns theresult to MATLAB.

3. Experimental data and analysis

In order to evaluate the feasibility of this approach tospeeding up MATLAB programs, an experimental studywas performed to study the e�ect of the size of matricesand the number of processes on the computation timefor matrix multiplication (Weiss, 1997) compared to thetime required for the built-in MATLAB command.

Two con®gurations were tested, one using Sunworkstations and one using PC's. The testing was per-formed during o�-hours when the load on the networkand computers was low and no other user processeswere running. For the con®guration using Sun work-stations (Sparc10 Model 30, 80 MHz, 64 MB RAM witha 10 Mbps Ethernet TCP/IP network), the run-times for

Fig. 1. Architecture of the interface and PVP software.

78 J.P. Ho�beck et al. / The Journal of Systems and Software 56 (2001) 77±80

multiplying two N by N matrices using the built-inMATLAB command and using the PVP with 1±16processes running on separate Sun workstations weremeasured for various matrix sizes. The speedup, which isthe ratio of the run-time for MATLAB to the run-timefor the PVP, is plotted in Fig. 2 for P � 4, 8, 12, and 15processes. A speedup of one is marked on the graph forconvenience and indicates the run-times for MATLABand the PVP were equal. A speedup of greater than oneindicates the PVP was faster, and a speedup of less thanone indicates MATLAB was faster.

Fig. 2 shows that the advantage of doing matrixmultiplication in parallel increases as the size of thematrix increases. This result is consistent with the factthat the number of operations needed to multiply two Nby N matrices increases as N 3 whereas the amount ofdata that needs to be transmitted only increases as N 2.For small values of N, it is actually faster to do all thecalculations on the client computer than to send the datato multiple machines and wait for the results. Howeveras N increases, the amount of computation requiredincreases much faster than the time required to transmitthe data, and eventually the time saved by doing thecalculations in parallel more than compensates for thetime required to transmit the data to multiple machines.With four processes, for example, it was faster to per-form the multiplication in parallel only for matrices ofsize 1365 by 1365 or larger.

In Fig. 3, the speedups for multiplying two N by Nmatrices as a function of the number of processes isplotted for N � 64, 512, 1365, and 2048. For large ma-trices, as the number of processes increases, the speedupof the PVP initially increases rapidly but eventuallylevels o� to a point where increasing the number ofprocesses provides little or no speed improvement. Forsmall matrices, increasing the number of processes ac-

tually makes the run-time increase and the speedup de-crease because of the additional time required totransmit the data to the other processes. The greatestspeedup for 64 by 64 matrices, for example, wasachieved using a single process. As the number of pro-cesses increases, the time required to transmit the data toadditional processes overwhelms the processing timesaved by using the additional processes. Also note thatthe time required to transmit the data to a server causedthe run-time for a single process to be much greater thanthat for the built-in MATLAB command, and onlywhen three or more processes are used is our parallelimplementation faster than the built-in command for thelargest matrix size. Using 15 workstations in parallel tomultiply two 2048 by 2048 matrices achieved the bestspeedup and was 2.95 times faster than the built-inMATLAB command.

The data in Fig. 3 indicates that it is not always fasterto use more processes, and for a given problem there isan optimal number of processes. The number of pro-cesses that resulted in the greatest speedups in our em-pirical study, is graphed in Fig. 4 as a function of thematrix size N. Generally the optimal number of pro-cesses increased as the size of the matrices increased.

The PVP was also implemented on a network of fourPC's (Pentium MMX at 166, 200, 200, and 266 MHz, 64MB RAM, PCI Bus, running LINUX with a 10 MbpsEthernet TCP/IP network), and the run-time for thefour PC's was compared to the run-time for four Sunworkstations (see Fig. 5). Compared to the four Suns,the four PC's were about 2.5 times as fast for largematrices when the calculation time was large comparedto the time required to transmit the data. This is anexpected result because the clock speed of the PC's wasfaster than the workstations. For small matrices wherethe calculation time is insigni®cant compared to the time

Fig. 2. Speedups (run-time for MATLAB divided by run-time for the

PVP) as a function of the size of the matrices.

Fig. 3. Speedups as a function of the number of server processes.

J.P. Ho�beck et al. / The Journal of Systems and Software 56 (2001) 77±80 79

required to transmit the data, the run-times for the PC'swere similar to the workstations.

4. Future improvements

The amount of time spent transmitting the data to theservers is critical in making the parallel implementationpractical for moderately sized matrices. Signi®cant im-provement could be achieved if we could broadcast thesecond matrix to all the servers simultaneously, a featurewhich is available in the new Internet Protocol IPV6.

A reduction in the required transmission time couldalso be achieved by partitioning the problem di�erently.With four processes for example, each of four servers inthe current implementation computes one fourth of therows of the solution, requiring it be sent one fourth ofthe ®rst matrix and all of the second matrix for a total of®ve eighths (62.5%) of the original data. If each servercomputed a corner of the solution instead of a row, itwould only need half of the rows of the ®rst matrix andhalf of the columns of the second for a total of half(50%) of the original data. This improvement wouldincrease as the number of processes increase.

5. Conclusion

In this study, we developed and tested an interfacethat allows MATLAB commands to be executed inparallel on several computers connected to a network totake advantage of existing computers that are under-utilized. The performance of this interface was investi-gated for matrix multiplication, and the experimentalresults show up to a 2.95 times improvement in execu-tion speed for 2048 by 2048 matrices and 15 processeson the Sun workstations. The network of four PC's wasabout 2.5 times as fast as four Sun workstations, al-though the clock speeds of the PC's were higher than theSuns. The time required to transmit the data to theprocesses is an important parameter in the performance,and so the performance is expected to improve for fasternetworks and re®ned data partitioning techniques thatwould require less data to be transferred to the servers.Also, operations that require large amounts of compu-tation time compared to the time required to transmitthe data, such as matrix multiplication for very largematrices, are most likely to bene®t from a parallel im-plementation.

References

Comer, D.E., Stevens, D.L., 1994. Internetworking with TCP/IP. In:

Client±Server Programming and Applications. Vol. III, Prentice-

Hall, Englewood Cli�s, NJ.

MathWorks Inc., 1998. MATLAB The Language of Technical

Computing: MATLAB Application Program Interface Guide.

The MathWorks, Inc.

Stevens, R., 1999. Unix Network Programming. Interprocess Com-

munications, second ed., Vol. 2, Prentice-Hall, Englewood Cli�s,

NJ.

Weiss, M.A., 1997. Data Structures and Algorithm Analysis in C,

second ed., Addison-Wesley, Reading, MA.

Fig. 5. Comparison of run-times for PVP implemented on a network

of four PC's and a network of four Sun workstations, where the PC's

have higher clock speeds.

Fig. 4. Number of server processes that resulted in the greatest

speedups as a function of the size of the matrices.

80 J.P. Ho�beck et al. / The Journal of Systems and Software 56 (2001) 77±80

interfacing matlab with a parallel virtual processor for matrix algorithms

Documents