a pvm-based distributed parallel symbolic system

10
ELSEVIER Advances in Engineering Software 28 (1997) 303-312 C 1997 Elsmer Science Limited All rights reserved. Printed in Gnat Britain PII: SO965-9978(97)00017-3 096.5 9978;97/$17.OQ Claudia Di Napoli,* Maurizio Giordano & Mario Mango Fumari Istituto di Cibernetica, C.N.R., Via Toiano, 6 I-80072 Arco Felice, Napoii. Italy (Received 17 October 1995; accepted 3 1 December 1996) The purpose of this work is to provide a parallel programming environment for symbolic applications suited to run on Distributed Memory Parallei Systems (DMPS). This paper describesthe implementation of such an environment that is based on Sabot’s Paralation model (in particular its LISP implementation, Paralation LISP). In our opinion, the clear distinction between the communica- tion and computation aspectsin the Paralation model allows a clearer and easier exploitation of the opportunities of parallelism in symbolic applications. Furthermore, the simple semantics of its parallel constructs seem well suited to run on a community of loosely coupled LISP interpreters. We chose to implement the Paralation model constructs on the top of the Parallel Virtual Machine (PVM) package becausethis tool allows one to have a cheap virtual distributed memory parallel machine simply by using some LAN interconnected powerful workstations. 0 1997 Elsevier Science Limited. 1 INTRODUCTION It is a well-known fact that a large class of problems can be processed efficiently in a multiprocessing setting. In fact, significant advances in designing parallel algor- ithms and architectures have proved the potential for applying concurrent computation techniques to a wide variety of problems. problem-solving applications.“* To deal with the large amount of computation typical of these applications, one viable approach may be distributed computing. Today a parallel approach is widely accepted essen- tially for high-performance numerical processing typical of scientific applications. In the last few years symbolic problems have also been approached in a parallel framework. Parallel sym- bolic computation seems to be a difficult area, since symbolic applications generally offer several opportun- ities for small grain parallelism that are extremely irregular and therefore difficult to be “easily” managed by parallel systems. Distributed Memory Parallel Systems (DMPS) are currently experiencing renewed interest because they are scalable, cost-effective and capable of high bandwidths, due to the increasing performances of contemporary computer networks. The advent of new architectures with sustained high-speed microprocessors makes the use of DMPS attractive also for symbolic appiications. Therefore, our purpose is to build a unified and user- friendly parallel programming environment, suited to distributed memory parallel architectures, that handles coarse grain parallelism. The system we implemented is based on Sabot’s Paralation model (in particular its LISP implementa- tion, Paralation LISP). However, an increasing number of symbolic applica- tions suited to be parallelized are emerging every day. In fact, symbolic applications in many AI domains extensively use state-space search procedures, whose complexity depends on potentially large solution spaces to be searched. The solution to a particular problem is usually obtained by composing or comparing different partial solutions. Natural language recognizers, expert systems, vision systems and theorem provers fall in this general class of The notion of global namespace in LISP is responsible for the intensive work done in implementing parallel extensions of this language on shared memory parallel architectures.3-5 Few extensions have been done for distributed memory parallel architectures, but they seem to keep using an approach based on a “virtually” shared memory. Furthermore, they usually rely on new com- putational mechanisms to manage the synchronisation in accessing shared data structures.6.7 We think the features of Paralation LISP are suitable to be imple- mented on distributed memory architectures without affecting the semantics of the basic LISP primitives. * e-mail address: (Claudia, maug, mf)@zeus.na.cnr.it In this paper we describe our current research efforts 303

Upload: independent

Post on 27-Jan-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

ELSEVIER

Advances in Engineering Software 28 (1997) 303-312 C 1997 Elsmer Science Limited

All rights reserved. Printed in Gnat Britain PII: SO965-9978(97)00017-3 096.5 9978;97/$17.OQ

Claudia Di Napoli,* Maurizio Giordano & Mario Mango Fumari

Istituto di Cibernetica, C.N.R., Via Toiano, 6 I-80072 Arco Felice, Napoii. Italy

(Received 17 October 1995; accepted 3 1 December 1996)

The purpose of this work is to provide a parallel programming environment for symbolic applications suited to run on Distributed Memory Parallei Systems (DMPS). This paper describes the implementation of such an environment that is based on Sabot’s Paralation model (in particular its LISP implementation, Paralation LISP). In our opinion, the clear distinction between the communica- tion and computation aspects in the Paralation model allows a clearer and easier exploitation of the opportunities of parallelism in symbolic applications. Furthermore, the simple semantics of its parallel constructs seem well suited to run on a community of loosely coupled LISP interpreters. We chose to implement the Paralation model constructs on the top of the Parallel Virtual Machine (PVM) package because this tool allows one to have a cheap virtual distributed memory parallel machine simply by using some LAN interconnected powerful workstations. 0 1997 Elsevier Science Limited.

1 INTRODUCTION

It is a well-known fact that a large class of problems can be processed efficiently in a multiprocessing setting. In fact, significant advances in designing parallel algor- ithms and architectures have proved the potential for applying concurrent computation techniques to a wide variety of problems.

problem-solving applications.“* To deal with the large amount of computation typical of these applications, one viable approach may be distributed computing.

Today a parallel approach is widely accepted essen- tially for high-performance numerical processing typical of scientific applications.

In the last few years symbolic problems have also been approached in a parallel framework. Parallel sym- bolic computation seems to be a difficult area, since symbolic applications generally offer several opportun- ities for small grain parallelism that are extremely irregular and therefore difficult to be “easily” managed by parallel systems.

Distributed Memory Parallel Systems (DMPS) are currently experiencing renewed interest because they are scalable, cost-effective and capable of high bandwidths, due to the increasing performances of contemporary computer networks. The advent of new architectures with sustained high-speed microprocessors makes the use of DMPS attractive also for symbolic appiications. Therefore, our purpose is to build a unified and user- friendly parallel programming environment, suited to distributed memory parallel architectures, that handles coarse grain parallelism.

The system we implemented is based on Sabot’s Paralation model (in particular its LISP implementa- tion, Paralation LISP).

However, an increasing number of symbolic applica- tions suited to be parallelized are emerging every day. In fact, symbolic applications in many AI domains extensively use state-space search procedures, whose complexity depends on potentially large solution spaces to be searched. The solution to a particular problem is usually obtained by composing or comparing different partial solutions.

Natural language recognizers, expert systems, vision systems and theorem provers fall in this general class of

The notion of global namespace in LISP is responsible for the intensive work done in implementing parallel extensions of this language on shared memory parallel architectures.3-5 Few extensions have been done for distributed memory parallel architectures, but they seem to keep using an approach based on a “virtually” shared memory. Furthermore, they usually rely on new com- putational mechanisms to manage the synchronisation in accessing shared data structures.6.7 We think the features of Paralation LISP are suitable to be imple- mented on distributed memory architectures without affecting the semantics of the basic LISP primitives.

* e-mail address: (Claudia, maug, mf)@zeus.na.cnr.it In this paper we describe our current research efforts 303

304 C. Di Napoli et al.

in developing parallel implementation strategies of Paralation LISP on DMPS.

We first give an outline of the Paralation model, pointing out its main characteristics and describing its parallel constructs. Next, we describe the architectural model we refer to and the implementation strategy of Paralation LISP on this architecture. Finally, some experimental results and conclusions are given.

2 THE PARALATION MODEL

The Paralation model is a parallel computational model developed by Sabots9 It is independent of the hardware architecture, both for system interconnections, and processor types. The central idea is the clear distinction between computation and communication.

The basic entity is the new data structure, named paralation, which represents a set of virtual processors, or sites (see Fig. 1). A paralation instance is called a field, a collection of values, one for each site. In other words, the paralation may be viewed as a collection of fields, all with the same size. All the fields belonging to the same paralation are aligned in the sense that their ith elements are stored in the same site. This property is called the intrasite locality. A paralation has at least one associated field, named the index field, that enumerates the paralation elements.

Paralation sites with nearby indices are not necessarily near each other. The “nearness’ is based upon the concept of paralation shape (intersite locality) that captures the communication costs. Paralation of any shape may be constructed. Thus, the sites of a particular

paralation might be arranged in the form of a three- dimensional grid, a ring, a butterfly, a pyramid or even a double ellipse. By default, a paralation has no inherent shape. The shape must be explicitly described by the user.

A third kind of locality is the interparalation locality: sites and values in different, unrelated paralations are not necessarily near each other regardless of their indices.

The paralation itself is never directly accessible, but only fields can be handled. Two special kinds of operators, associated with this new structure, operate upon the paralation fields. The first one is the elwise construct for carrying out parallel computations; the others are the match and < - (“move”) constructs for communications. The syntax of the parallel elwise construct is:

(elwise (<var-list>) <body>)

where < var-list > is a list of symbols, which will be bound to fields given as arguments and belonging to the same paralation, while <body > is the code that will be executed in parallel at each paralation site. The returned value is a new field in the same paralation. An example of elwise computation is given in Fig. 1. The Paralation model assumes a simple synchronization condition:

the elwise evaluation terminates and returns its result only after all the paralation sites have finished executing the program

There are no restrictions on the <body > code. This means that the program executed in parallel at each paralation site allows side-effects also on the site’s

(setq p (make-paralation 5)) ;; Creates a paralation and => IF(0 1 2 3 4) ;: returns its index field

(elwise (p) (* p 2)) ;; computes multiplication in parallel => #F(O 2 4 6 8) ;; and returns the elwise result field

(elwise (p) ;; assigns list elements (elt '(orange apple pear grape tomato) p))) ;; in parallel to the

=> #F(ORANGE APPLE PEAR GRAPE TOMATO) ;; elwise result field

paralation

index field

fields

Fig. 1. Paralation and elwise examples.

PI/M-based distributed parallel symbolic system 305

values. Nevertheless, there exists a restriction on the side-effects type:

since the elwise does not make any guarantee about synchronization between paralation sites as they execute the elwise body, having conflicting side-effects between sites is dejined as an erroneous behaviour.

paralations according to a mapp&g given as argument, and returns a new field containing the transferred data. If it is necessary to move more than one 8&d value into the same site, a combining j&t&n is used to avoid conflicts reducing the multiple values into a single one. Examples of match and < - operators are shown in Fig. 2.

Side-effects may conflict it two or more sites simultan- eously read and write the same memory location.

The match and < - operators allow communication between the sites of paralations. The match operator accepts two key fields (destination and source) and an optional equality predicate as its arguments; it computes and returns a new object, called mapping, that encap- sulates a communication pattern between two parala- tions by connecting the source paralation sites to the destination paralation sites if their key values are the same. The < - operator moves field values between

3 THE ARCHITECTURAL MODEL

The architectural model we refer to is essentially com- posed of a set of independent processing elements (PE), interconnected through a high-speed communication network, and a host whose role is to communicate with the user. Each PE has a local memory, i.e. a memory private to the PE and invisible to the others, and it communicates with the others via a message-passing paradigm (see Fig. 3).

(setq map (match f2 fl)) ;; computes and returns => #SWAPPING ;; a mapping between

:TO-KEY-FIELD #F(O 1 2 NIL 3 2) ;; two fields :FROM-KEY-FIELD #F(NIL 0 2 3 NIL 0 1))

field fl field f2

mapping

(setq def-field (elwise (f2) ‘no-value))

=> #F(NO-VALUE . . . NO-VALUE) :; CO~uteS default field ;: for moving operation

(<- (index fl) :by map :with '+ :default def-field)

=> XF(6 6 2 NO-VALUE 3 2)

;: moves data from source to ;; destination field according ;; to the previous mapping and a ii plus combining function

index-field of fl move result

A I field

Fig. 2. match and < - examples

306 C. Di Napoli et al.

Communication Network I I I

host PE PE

Fig. 3. The architectural model.

We chose to use the Parallel Virtual Machine (PVM) package to have this kind of architecture available using only a set of LAN interconnected workstations.

The PVM package was developed at the Oak Ridge National Laboratory.” It is a general and flexible paral- lel computing resource that supports shared memory, message-passing and hybrid models of computation. The main feature of PVM is to permit a collection of heterogeneous computing elements (including single CPU systems, vector machines and multiprocessors) to be viewed as a general purpose concurrent computa- tional resource, without worrying about the lower layers of Interprocess Protocol. These computing elements may be interconnected by one or more networks, which may themselves be different. In fact PVM provides a suite of user-interface primitives and a supporting software that enables concurrent computing on loosely coupled networks of PEs. These computing elements are accessed by applications via a standard interface that supports common concurrent processing paradigms in the form of well-defined primitives that are embedded in procedural host languages.

PVM consists of two parts: a daemon process that any user can install on a machine; and a user library that contains routines for initiating processes on other machines, for communicating between processes and synchronizing processes.

Several design features differentiate PVM from other similar systems. Among these there are the combination of heterogeneity, scalability, multilanguage support, provisions for fault tolerance, the use of multiprocessors and scalar machines, an interactive graphical front end, and support for profiling, tracing and visual analysis.”

4 THE IMPLEMENTATION STRATEGY

The simple semantics of the Paralation parallel con- structs seems suited to be implemented on the top of a community of loosely coupled LISP interpreters.

In fact, the implementation strategy we designed consists of a set of extensions to loosely coupled XLISP interpreter community, where XLISP is a portable dialect of LISP.‘*

Each interpreter communicates with the others via message-passing. A single interpreter plays the role of

master program; it interacts with the user, performs all the I/O operations, and activates the other interpreter programs, called the slave programs, when it evaluates the Paralation constructs. In this case the master program sends the slaves requests to evaluate LISP expressions remotely, and waits for their results.

Each slave program waits for a request by the master; then it evaluates the incoming request applying the standard evaluation loop; at the end it sends the result to the master program. In this scheme, the read-eval- print loop of a slave LISP interpreter becomes a wait- eval-send loop, while the one on the master interpreter is unchanged, apart from when it evaluates parallel constructs.

To obtain the described behaviour, the XLISP inter- preters have been interfaced with PVM package, i.e. their source code has been extended with a small number of modules which handle all actual communication among them by using PVM library primitives.13 At programming level these extensions appear like new LISP primitives which allow remote and concurrent evaluations. The programmer uses these primitives without being aware of the underlying PVM code and of the parallel virtual machine configuration settled in the initialisation phase. In fact our choice was to make transparent to the user all the actual communication among processes concurrently executed. In this respect our work is different from other distributed implement- ations of the Paralation model found in the literature.14 Furthermore, the availability of PVM package for a large number of architectures makes it possible to run our system on a virtual parallel machine composed of a set of heterogeneous machines, so the portability of the system is guaranteed.

4.1 The data structure implementation

The implementation strategy for the paralation data structure on DMPS is to allocate the paralation sites on different processors (nodes), where the number of sites located on the same processor depends on the available number of processors. It should be noted that the concepts of site and node are not necessarily synony- mous, as more than one site in a paralation could be assigned to only one node; the paralation is only a virtual processor set.

The paralation data structure has been implemented in such a way that the master interpreter keeps a paralation descriptor as a global data structure, while the paralation data values are distributed among the processors’ memories. The paralation descriptor has three slots:

l length: the total number of sites in the paralation l shape: the data access topology for the paralation

elements that may be mapped on the DMPS processor network

PVM-based distributed parallel symbolic svstem 307

l index: a reference to the paralation index field descriptor.

Descriptions of paralation fields are also located on the master interpreter. They consist of four slots:

implementation code of the Paralation LISP constructs. but only the partition procedure with a more complex one which manages more information about machine architecture and paralation shape.

l parafation: a reference to the paralation descriptor l tids, ranges and keys: three vectors of the same size

equal to the number of available processors, These vectors contain, at each position corresponding to a slave program, respectively: a slave interpreter identifier where the paralation sites are stored; the first index and the number of sites stored on that interpreter; the memory reference where field values are stored.

According to the paralation partition settled by the partition function, the master program builds the tids, ranges and keys vectors of the index field descriptor. Then it sends messages to the slave programs instructing them to store the parts of the paralation index field under the corresponding keys settled before, After the slaves have signalled the end of their jobs. the master links the paralation slot of the field descriptor to the paralation which the field belongs to

This allocation strategy is sketched in Fig. 4. 4.2 The parallel constructs implementation When the master interpreter allocates a paralation

data structure it calls a partition procedure which computes a site partition among the slave interpreters.

The data allocation problem entails the data parti- tioning and data alignment problems. At the moment we do not deal with the site allocation strategy problem that we plan to approach in the near future, but we assume a uniform and static partitioning strategy based only on the number of available processors in the initialization phase

The elwise construct has been implemented according to the Single Program Multiple Data paradigm.

From the computational point of view. the constraint on side-effects during an elwise computation guaran- tees that there will be no data dependencies among site evaluations. This condition, together with a data partition along the index field, suggests distributing the elwise computations according to the data partitioning pattern.

In future we plan to adopt other strategies based on a slave dynamic configuration which can take into account the processor availability, speed, load and interconnection topology during the computation. At this end. it will not be necessary to change the actual

Let us describe the elwise implementation strategy. What happens during the elwise evaluation is summar- ized in Fig. 5. Here we describe the paraiation structure before and after the elwise execution in the master node. Moreover, the messages exchanged between

master

paralation description

field (index) description

slave t id- 0 slave t id-N

key-0

Fig. 4. Paralation allocation strategy.

308 C. Di Napoli et al.

master

evaluation:

field xl teld ti keys - * . , keys par 9 par 9

/ /

(elwise (xl . . . x&If <body>)

message

~1

(let* (("old-xl (local-read key-xl)) *...*...*...**.......*I **.*..

(#old-xN (local-read key-xN)) (#size (length #old-xl)))

(let ((xl . . . xN) (#result-vector (make-array #size)))

(dotimes (#index #size) (setq xl (aref told-xl #index)) ..s....* . . . . . . . . . . ..~....~.~.~.

(setq XN faref #old-xN #index)) (setf (aref #result-vector #index) (progn <body>)) (setf (aref #old-xl #index) xl)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ;,etf (aref #old-xN #index) xN))

(local-write key-res #result-vector)))

message

master

Jzeld xl field ti resultjieid

keys keys keys . . . . par 9, e , par par # / / l I

Fig. 5. Parallel implementation for elwise.

master and slave programs have been described together with the info~ation needed for computation.

First of all, the master program extracts the tids information to identify where to send messages. Then it assembles messages with this form:

(local-elwise (xl . . . xN) (key-xl . . . key-xN) key-res <body>)

where key-xl . . . key-xN are references, private to each slave, to memory locations, where each location contains the subset of values of each field given as argument.

Similarly, key-res is the reference to the memory location where the subset of field values resulting from the slave <body > evaluation will be stored. All slave interpreters evaluate concurrently the function call contained in the message, i.e. they execute the code shown in Fig. 5.

Memory references to the field subsets on each remote node are crucial info~ation. In fact, each node must know where to read (by using key-xl . . . key-xl references) and write (by using key-res references) the subsets of field values on its local memory.

PVM-based distributed parallel symbolic system 309

Each slave interpreter saves the argument field values into local variables (old-xl . . . old-d), initializes a vector (result-vector) representing the result field subset, and then executes a loop through the indices of this vector. At each iteration the elwise variables (xl . . xN) are bound to the local argument field values, then the < bady > is executed and the result is stored into each position of the result field vector; finally, the local argument field values are updated for side-effects. After the loop is finished, the reference key-res points to the result field subset.

After the local-elwise execution, each slave inter- preter sends the master a message signalling the end of its job. The master builds a new field descriptor for the elwise result, assigns the appropriate values to its slots, and links this descriptor to the same one of the elwise argument fields. According to the elwise synchroniz- ation rule, the master terminates the elwise computa- tion only when all the slaves have sent back their results. At the bottom of Fig. 5 there is the paralation descriptor on the master node after the elwise execution. Now a new result field has been adjoined to the former repre- sentation, as it is necessary to store here the vector containing the references to the remote node memory locations where the parallel computation results have been stored.

Regarding the match and < - communication con- structs, we followed the same implementation strategy.

The elwise construct is defined as a macro function (as in Sabot’s Paralation LISP), whereas the match and <: -. constructs are standard functions. In fact, the master interpreter, before sending the slaves the messages containing the expression to be remotely evaluated, has to evaluate the arguments in its local environment.

When the master evaluates the match construct, it sends a message to each slave requesting the remote evaluation of a local version of the match function. This function takes as arguments the whole source field and the destination field subset which has previously been locally stored. It also takes the keys where the partial mappings, to be concurrently computed by the slaves, have to be stored, and the keys where the local destination field subsets have been previously stored. The messages are addressed to the appropriate slaves using the tids identifiers. The partial mapping structures are locally stored like the field subsets. After all slaves have signalled the end of their jobs, the master builds a global data structure with references to the remotely allocated partial mappings.

The < -- construct implementation is similar to the match one. After having evaluated the < - arguments, the master interpreter assembles messages which instruct the slaves to evaluate a tocal version of the < .- function whose arguments are the whole source field to be moved, the key where the partial mapping has been previously stored on the slave memory, and the key

where the new field subset obtained by moving data has to be stored. Once the slave interpreters receive the messages, they concurrently compute the local version of the < - function according to the local partial mapping, by transferring all the values of source data field into a new field subset which is locally stored. As before, the master builds a global field descriptor with appropriate references after all slave computations have finished.

Since the source field is entirely available on each slave during the local move computation, each slave is able to resolve conflicts locally which may arise when two or more data have to be moved into the same site by using a combining function given as an argument.

5 EXPERIMENTAL RESULTS

To test the adequacy of the strategies used in designing the implementation of this en~ro~ment~ we ran some simple applications on different machine configurations.

In our implementation the minimal configuration for the system is composed of two interpreters, one playing the master role and the other the slave role, each one running on a single PE. For this reason, we de&ted the speed-up as follows:

S(P) = 2

where T, is the program execution time obtained with the minimal configuration and Tp is the program execution time obtained with P siaves running on P different PEs.

We decided to test our system by implementing in Paralation LISP a set of algorithms which present different coupling degrees among processes concurrently executed, i.e. different amounts of interprocess com- munication due to the use of match and c - operators. We present here the results obtained for the three that we considered most significant.

The first algorithm is a simple pattern matcher that uses the form of a symbol to determine its role within a pattern. The basic operation of the benc~ark is to search a database of objects, identifying all of those that satisfy some predicate. The objects contain descriptors which are patterns. The predicate is that a set of search patterns matches the descriptors. The match is done exhaustively, i.e. ail search patterns are matched against all descriptors regardless of the outcome of any of the individual matchests The program is intended to per- form many of the operations that a simple expert system might perform.

As shown in Fig. 6, we obtained a linear speed-up which is, at each point, almost equal to the number of the involved slaves. This result confirmed the ‘“sound- ness” of our system for applications characterized by loose coupling among concurrent computations.

310 C. Di Nap& et al.

number of in the

speedup

processor number

Fig. 6. The pattern matcher benchmark.

The second algorithm is the life automaton one which simulates the development of a population of organisms, under the effect of counteracting propagation and extinction tendencies. l6 The automaton is represented by a grid-shaped paralation data structure. During the life automaton computation, each cell, represented by a site, interacts with its eight neighbours. So the com- munications take place between a site and its eight neighbours. We used a block distribution strategy for the paralation data structure. In order to minimize communications among different processors we allo- cated on each processor memory, other than the sites where the elwise computation takes place {local sites), also copies of the border sites allocated on the other neighbour processors (overlapped sites). This strategy avoids the communication during the elwise computa- tion to retrieve the remotely stored data, and introduces the communication only to update the duplicated data after each elwise computation.

We ran the life automaton program by incrementing the size of the automaton data structure. As shown in Fig. 7, we obtained a better speed-up by increasing the automaton size. The result is related to the granularity of the tasks concurrently executed on each processor. In fact, when the task granularity increases, the costs due to the data exchange at each new automaton state computation do not effect the global execution time in a relevant way.

The third algorithm is a generator of prime numbers between 2 and n, which uses the sieve of Eratosthenes.’ As shown in Fig. 8, once again, with a fixed parallel machine configuration, we obtained better speed-up values by increasing n, i.e. the paralation data structure size. But in this case, with any fixed problem size, the speed-up diagram reaches a plateau even when increas- ing the number of processors. A similar behaviour can be found in other literature.‘? The poor speed-up values and the saturation phenomena confirmed that our system can efficiently exploit parallelism only in some application domains.

6 DISCUSSION

We investigated the Paralation model as programming language for vision applications by implementing a set of functions which extends the basic Sabot environment toward vision.” This programming environment has been simulated by simply implementing the Paralation model constructs on a sequential standard LISP interpreter. This experience suggests that the Paralation model is a viable approach to coping with coarse grain parallel symbolic computations and its features look to be suitable for a distributed implementation.

In this paper we described our current implementa- tion of the Paralation model on Dh4PS.

The choice to adopt a LISP interpreter community to implement our system allowed us to limit the number of new parallel p~mitives and to maintain the semantics of the standard LISP primitives. Furthermore, in our system the same Paralation LISP program can run on machine configurations with different numbers of processors. So, to scale the system it is necessary only to change the set of available nodes at the system initialisation phase.

We implemented our system on top of the PVM pack- age. The main reason for this choice was that it allows us to use a set of LAN interconnected workstations to build a cheap DMPS. Furthermore, many computer companies have chosen PVM as lingua frunca to develop massive parallel applications on the new DMPS, for example, Cray Mpp, Convex, and IBM. This tool gained widespread use for numerical applications, so another goal of our experience is to assess its adequacy to support also the development of parallel symbolic applications.

The experimental results we obtained show the suitability of the system to run a distributed application characterized by coarse grain parallelism. In fact, only by increasing the task granularity are we able to overcome the communication costs due to the use of the match and < - constructs. In our implementation these constructs involve the transfer of an entire paralation field (the source field) to each slave, because

PVM-based distributed parallel symbolic system 311

size

prucessur number

Fig. 7. The life automaton benchmark.

5 size

prucessor number

Fig. 8. The prime sieve benchmark

each slave during matching and moving data needs to work not only on its local data.

This hehaviour cannot be improved using the inter- preters because it is impossible to restructure the pro- gram on the fly. For this reason, when paralation data structures are large and the number of slaves increases. the communication costs become more relevant in comparison with the gain obtained by distributing the computation. So we plan to exploit a compiling approach which allows more information on access patterns to he gathered in order to adopt more efficient strategies of data partitioning and alignment. This approach is crucial when we cope with parallel computations on distributed memory systems.

Furthermore, in a compiling approach we also plan to use the information given by the shape notion of the Paralation model to produce more efficient data-to- processor mappings. In fact, the paralation shape is useful when a paralation is modelling a problem in which locality between data values is important.

REFERENCES

I. Tanimoto, S. The Elements of ArttJicial Intelligence.

An Introduction using LISP. Computer Sctence Press. Rockville, MD, USA, 1987.

2. Bigliardo, M., Di Napoli, C. & Mango Furnan, M. A distributed symbolic system for AI applications. In Proc. Euromicro Workshop Parallel and DLstributed Processing. IEEE Computer Society Press, Los Alamitos. CA, USA. 1993, pp. 544-549.

3. Goldman, R. & Gabriel, R. P. QLISP: experience and new directions. In Proc. ACM Symp. Parallel Programming: Experience and Applications, Language and Svstems, New Haven, CT, USA, July 1988, pp. 11 I-- 123.

4. Halstead, R. H. Implementation of MultiLisp: Lisp on a multiprocessor. In Proc. Conf. 1984 ACM Svmp. Lisp and Functional Programming, Austin, TX, USA, pp. 9--17.

5. Halstead, R. H. New ideas in parallel Lisp: language design. implementation, and programming tools. In Parallel Lisp Languages and Systems, ed. R. H. Halstead, LNCS 441. Springer, Berlin, 1991, pp. 200.-234.

6. Kessler, R. R. & Swanson, W. R.. Concurrent scheme. In Lectures Notes in Computer Science, Vol. 441. Springer, Berlin, 1989.

7. Piquer, J. M. Sharing data structures in a distributed Lisp. INRIA, Ecole Polytechnique, June 1990.

8. Sabot, G. W. The Paralation Model: Architecture In&- pendent Parallel Programming. MIT Press. Cambridge, MA, 1988.

9. Sabot, G. W. Introduction to Paralation Lisp. MIT Press. Cambridge, MA, 1987.

IO. Geist, G. A. & Sunderam. V. S. Network based concurrent

312 C. Di Napoli et al.

computing on the PVM system. Mathematical Science Section, Oak Ridge National Laboratory, Oak Ridge, TN, TN 37831.

11. Beguelin, A., Dongarra, J., Geist, G. A., Mancheck, R. & Sunderam, V. S. PVM and hence traversing the parallel environment. Cruy Channels, 1992, 22-25.

12. Betz, D. M., Almy, T. et al., XLISP-plus: another object- oriented LISP, Version 2.1, http://www.allegro-music.com/ nalmy/xlisp.html, 1992.

13. Geist, G. A., Beguelin, A. et al., PVM 3.0 User’s Guide and Reference Manual. Mathematical Science Section, Oak Ridge National Laboratory, Oak Ridge, TN, TM- 12187.

14. Batey, D. & Padget, J. A. Coordinating heterogeneous parallelism. In Proc. 3rd Euromicro Workshop Parallel and

Distributed Processing. IEEE Computer Society Press, Los Alamitos, CA, USA, 1995.

15. Gabriel, R. P. Performance and Evaluation of LISP Systems. MIT Press, Cambridge, MA, 1986, pp. 136-146.

16. Toffoli, T. 8z Margolus, N. Cellular Automata Machines: a New Environment for Modelling. MIT Press, Cambridge, MA, 1987.

17. Chatterjee, S., Blelloch, G. E. & Ficher, A. L. Size and access inference for data-parallel programs. In Conf. Pro- gramming Language Design and Implementation, ACM SIGPLAN ‘91, Toronto, Ontario, Canada.

18. Giordano, M. & Mango Furnari, M. A paralation LISP environment for the vision: a guide. TR-IC, Technical Report of CNR, 1992.