standard fortran 77 as a parallel language

Parallel Computing 8 (1988) 409-414 North-Holland

409

Standard F O R T R A N 77 as a parallel language

David F. SNELLING

Computing Studies Department, University of Leicester, Leicester, United Kingdom LEI 7RH

Abstract. There have been many attempts to'use FORTRAN as a language environment for parallel processing. In each case either the standard has been violated or some facilities have been left lacking and in most cases both. This paper briefly examines some of the approaches to parallel processing in the FORTRAN environment, provides an interpretation of the FORTRAN 77 standard which supports a very flexible and complete parallel processing environment, and discusses some of the implications of this approach.

Keywords. Parallel languages, FORTRAN 77 standard.

1. Introduction and summary

Because of the large investment in scientific software, it is unlikely that the scientific community will readily switch to any language other than FORTRAN. As a result much attention has been devoted to improving the standard of FORTRAN in line with the needs of the scientific user. Examples of such changes to FORTRAN include the IF-THEN-ELSE construct in FORTRAN 77 and the array extensions in FORTRAN 8X. In recent years, with the advent of parallel processors, the question of how FORTRAN users should be given access to the parallel hardware has arisen. In most cases the approach provided with each machine is unique. As a result there is no standard approach to parallel processing in FORTRAN, and the FORTRAN standards committee has decided to defer the issue to a later time. Several attempts have been made to close the gap between various approaches to parallel processing in FORTRAN [6, 7]. All these approaches, like the vendor supplied approaches, either violate the standard or are lacking in facilities.

In the remainder of this paper some of these strategies are compared to an approach which complies strictly to the FORTRAN 77 standard and offers a complete and rich environment for parallel processing through the use of two very simple library primitives. It should be noted that many distributed memory machines do not support a shared memory, but rely instead on message passing. This paper assumes that there is a shared memory. This can be easily modeled on distributed memory systems but will usually result in degraded performance.

2. Data control

The most critical aspect of parallel processing is the control of data used in the parallel environment. There are two primary classes of data in a parallel environment: private data which is accessible to exactly one process, and shared data which is accessible to all parallel processes. Any environment for parallel processing must support these two classes. A secondary class of data is semi-private. Semi-private data is shared by a group of parallel processes, but is

0167-8191/88/$3.50 © 1988, Elsevier Science Pubfishers B.V. (North-Holland)

410 D.F. Shelling / Standard FORTRAN 77 as a parallel language

considered as private to any other group of pareUel processes. Although this class can be emulated by careful use of shared data, some problems are more easily solved using all three classes (shared, semi-private, and private).

In most parallel systems data local to a subroutine is private to a parallel process. This usually excludes variables which appear in a SAVE or DATA statement. This is because in the standard, variables which appear in a SAVE statement must be retained across repeated calls to a given subroutine. In most parallel implementations of FORTRAN a stack is used for local data, and each parallel process is given its own stack. This provides an easily maintained private data construct. Therefore, any data which cannot be placed on the stack (i.e. variables in a SAVE or DATA statement) is by default shared. In some implementations additional prh'ate data can be declared. For example, in the CRAY implementation, a COMMON block t:an be changed into a TASK COMMON (a violation of the standard) which is then private to the task or process [3].

The shared data facility on most systems is provided through COMMON, but in some cases there is no explicit shared memory at the FORTRAN language level. In ETA implementation of FORTRAN all COMMON is private to the process, and shared data is managed through a series of library calls [2]. This approach, like that of distributed memory systems, can provide more security for the programmer than in a shared memory environment. In many cases the lack of security experienced on shared memory systems is a function more of being limited to only two levels of memory, and less to the nature of shared memory. Problems tend to arise when data which is semi-private in structure must be implemented using universally shared data structures. There are no systems known to the author which provide a semi-private data class for parallel processing.

Thus in general practice, if the standard is to be adhered to, COMMON must be either shared or private. Both of these strategies have severe limitations. If COMMON is always private, then all data sharing must be performed explicitly through specialized library calls. The best examples of these approaches are the SPLIB and ETA libraries [2.7]. On the other hand if COMMON is always universally shared, then private communication between subroutines within one process is restricted to the parameter list on the subroutine call. The best examples of this approach are the CRAY and HEP systems [3, 5]. (In this CRAY case TASK COMMON is ignored in a effort to comply with the standard.)

This approach to parallel processing in FORTRAN 77, called ERIK, supports private,

I[,.~-" L'~-'i=(,J

/SEMI-PRIVATE/ /SEMI-PRIVATE/ I

1

r

Fig. 1.

D.F. Snelling / Standard FORTRAN 77 as a parallel language 411

shared, and any number of semi-private classes of data. i The data control aspect of ERIK is based on a careful reading of the FORTRAN 77 standard. The relevant passages are reproduced below:

Execution of a RETURN or END statement sometimes causes entities in named common blocks to become undefined but never causes entities in blank common to become undefined - 8.3.5(1). Execution of a RETURN statement (or END statement) within a subprogram causes all entities within the subprogram to become undefined, except for the following:

(1) entities specified by SAVE statements: (2) entit~es m blank common; (3) initially defined entities that have neither been redefined or become unde-

fined; (4) entities in a named common block that appears in the subprogram and

appears in at least one other program unit that is referencing, either directly or indirectly, the subprogram.

Note that if a named common block appears in the main program, the entities in the named common block do not become undefined at the execution of any RETURN statement in the executable program - 15.8.4 [1].

In effect this passage states that the continuance of data in a COMMON block, which appears initially at a given level in the program, cannot be assumed, if that level is exited (to a higher level) and then reentered. Concurrent execution of two copies of a given subroutine is conceptually equivalent to sequential execution of the same subroutine twice, with respect to private memory. In the ERIK approach a compiler switch, similar to one that specifies that local variables are to be placed in fixed memory or on a stack, is defined such that a COMMON block referenced initially at a given level is private to that level but shared to all levels below. In Fig. 1 the COMMON block / S H A R E D / is accessible to all parts of the program. The COMMON b l o c k / P R I V A T E / i s replicated four times and is private to each process, and the COMMON b l o c k / S E M I P / i s replicated twice and two processes share each.

This approach, although contrary to accepted practice, is allowable within the standard. In addition to allowing large numbers of variables to be passed across a normal subroutine call within a parallel process, it also allows for as many levels of semi-private data as the application requires. It should be noted that in ERIK, any COMMON appearing in the main program is, by default, shared by all parallel processes, as is any COMMON block appearing in a SAVE statement. Blank COMMON is always shared.

3. Process control

Control of processes is by far the simpler of the two aspects of parallel processing. There are essentially two primitives required. First, there must be some way to start parallel processes; and secondly there must be some mechanism for synchronizing them. In most systems which support explicit parallelism, this is accomplished by some form of a pseudo-subroutine call. In which case the subroutine will start executing and the calling routine will also continue executing. Even though there is some unanimity in this approach there are still several problems.

t This name is purely historical and has no real meaning, unless it stands for Easy Recipe for Incredible Komputing.

412 D.F. Snel l ing/ Standard FORTRAN 77 as a parallel language

One of these problems can be highlighted by comparing the following two techniques for starting a parallel process:

CALL TSKSTART (EXTRN, A1, A2, A3) CRAY CALL MTRUN (EXTRN, A1, A2, A3) ETA

In both cases the entry point EXTRN is declared as an EXTERNAL and a parallel process is started as described above. The problem arises in how the arguments are passed; the CRAY arguments are passed by reference and the ETA arguments are passed by value. As a result only scalars may be passed in the ETA environment. Note also that the number of arguments in each call may vary depending on the subroutine being created. This is a violation of the standard. The ETA avoids this by requiring that the identifier MTRUN be declared in an INTRINSIC statement. The CRAY compiler issues a warning message if, in the same subroutine, two calls to TSKSTART are made with different numbers of parameters. In both cases the library routine has access to the actual number of parameters in the parameter list, which constitutes an extension to the language.

In ERIK the path of least resistance is adopted; no parameters are allowed. The syntax is as follows:

CALL CREATE (EXTERN)

As above EXTRN must appear in an EXTERNAL statement. Synchronization is handled on most systems through a collection of simple calls. For

example, the HEP system allowed users access to the empty/full bit on each memory location through a few simple subroutine calls: e.g. X ffi AREAD (Y) wait for Y to become full, read it, and set it empty; CALL AWRITE (Y, 7.2) wait for Y to become empty, write 7.2 into Y, and set it full.

There are, implicit in the HEP approach, two types of synchronization. Synchronization may either signal one other process or it may signal all other processes. Admittedly, the 'all synchronization' can be accomplished by repetition of the 'one sywbtonization', but in many cases the ability to do both can simplify the programmer's task. In this approach the ability to perform both an 'all synchronization' and a 'one synchronization' is contained in a single library call with three parameters.

CALL ASYNC (NAME, ACTION, VALUE)

The first parameter is the name of the asynchronous variable and the second is the action to be taken on that variable. Both are of type character and have the following functions: NAME identifies the variable uniquely within the whole program and should be restricted to some finite length, 16 characters is assumed to be adequate. VALUE is an integer variable or constant used by the ASYNC function as specified by ACTION. ACTION is one of the following and may be abbreviated as indicated:

"SETE" Set the variable 'empty' unconditionally, and assign it the value VALUE ("SE").

"SETF" Set the variable 'full' unconditionally, and asssign it the value VALUE CSF").

"WRITE" Wait for the variable to be set 'empty', write VALUE into it and set it 'full' ("W").

"READ" Wait for the variable to be set 'full', read its contents into VALUE, and set it 'empty' ("R").

"WAITF" Wait for the variable to be set 'full' and read its contents into VALUE, but do not change its state ("WF").

"WAITE" Wait for the variable to be set 'empty' and read its contents into VALUE, but do not change its state (" WE").

Note that all operations are indivisible.

D.F. Snel l ing/ Standard FORTRAN 77 as a parallel language 413

4. Discussion

There are several issues which should be highlighted with respect to this approach. If a compiler always organized COMMON in this way, there are a large number of 'dusty deck' FORTRAN programs which would fail to work. To a great extent programmers have come to rely on the implementation assumption that any COMMON block exists forever regardless of the level on which it first appeared. Any implementation of ERIK would require that the compiler have a switch to turn on or off the nested allocation of COMMON.

One issue that has some real significance is that of reverse portability. That is to say, a parallel program developed in the ERIK environment may not run on a serial FORTRAN machine. In most cases there is no problem running an ERIK program with only one parallel process. The exception occurs when two parallel processes synchronize explicitly with one another. Most algorithms for selfscheduling, prescheduling, and parallel reduction can be performed on a serial machine, using the same algorithm and only one process. On the other hand, an a~tempt to pass a synchronous message requires more than one process. Such a construct would prevent a program from being ported from a parallel machine to a serial machine. This is true of all parallel machines currently available and is more a function of programming style than of language environment. If a serial programming environment, using the ERIK primitives, were designed so that subroutines called via the CREATE primitive were treated as coroutines, the environment would work in all cases. One coroutine, blocked by synchronization, could be suspended, control passed to another coroutine, and execution could continue.

The question of file I / 0 is handled quite easily. All ERIK parallel processes have access to all files opened by the program, and all synchronization is handled using the ASYNC primitive. Thus, if the synchronization is improperly handled, files which are interleaved by character, line, or block could easily occur. This approach was used by Denelcor in the first HEP/OS system [4].

The ERIK approach is completely compatible with most of the auto-parallel processing compilers and microtasking compilers currently available. Since these low level systems act entirely within a subroutine, there is no conflict with the higher level controls available with this approach.

A difficulty with the ERIK approach, not present in most research parallel libraries, lies in the fact this approach requires significant modification to an existing compiler on a shared memory multiprocessof. Vendors of multiprocessors will be disinclined to offer an ERIK environment in addition to their own systems, since the burden of supporting two systems is beyond the abilities of most vendors. However, a research compiler could be developed at a research or university site which had a source agreement with the vendor of a multiprocessor.

Lastly, it is expected that, under the current specification for FORTRAN 8X, all these concepts will generalize to the module construct. As a result ERIK's approach is unlikely to lose its applicability with the eventual deprecation and obsolescence of COMMON.

5. Conclusions

As long as there is no clear way of defining parallel algorithms in all parallel processing environments (shared and distributed memory for example), there is no reason why an approach, which adheres to the standard, should not be investigated. The ERIK approach has the advantage that virtually any parallel code developed could be run, without modification, on any serial processor system as well as other shared memory multiprocessor systems. The approach also solves many of the problems encountered by the developers of library packages.

414 D.E Snelling/ Standard FORTRAN 77 as a parallel language

The proviso of multiple levels of semi-private data would allow a given package routine to be called by two parallel processes in the user's code and yet still be parallel in itself. A parallel processing paradigm which is totally consistent with the F O R T R A N language standard, allows any existing code to be run serially without modification, shares an environment with modern auto-parallelization techniques, and eliminates one of the package writer 's major nemesis should not remain untried.

References

[1] American National Standard Programming Language Fortran (American National Standards Institute, New York, 1987).

[2] C.N. Arnold, Multitasking library design summary and specification (final draft), ETA Internal Document, 1985. [3] Cray Programmer's Library Reference Manual, Cray Research Inc., 1986. [4] HEP Parallel FORTRAN User's Manual, Denelcor Inc. Publication 10002-00, Denver, Colorado, 1980. [5] HEP FORTRAN 77 User's Guide, Denelcor Inc. Publication 9000006, 1982. [6] H. Jordan, The Force, Computer Systems Design Group, Department of Electrical and Computer Eng~aeering,

University of Colorado, 1987. [7] D.F. Shelling and G.-R. Hoffmann, A comparative study of libraries for parallel processing, Parallel Comput. 8

0988) 255-266, this volume.

standard fortran 77 as a parallel language

Documents