hierarchical partitioning algorithm for optimistic distributed simulation of devs models

23
J&J *H -- R ll!Q . ELSEVIEF: Journal of Systems Architeclure 44 (1998) 433-455 ;$h& OF ARCHITECTURE Hierarchical partitioning algorithm for optimistic distributed simulation of DEVS models Ki Hyung Kim ‘.*, Tag Gon Kim ‘, Kyu Ho Park b Abstract The partitioning problem of models is one of the most important issues which may affect the performance of distrib- uted simulation. This paper presents a novel partitioning algorithm for the optimistic distributed simulationof hierar- chical. modular Discrete Event System Specification (DEVS) models. The proposed algorithm pursues the following three goals to achieve the overall objective of the minimum simulation time: (I) to balance the computational loads of partitions, (2) to maximize the parrllel execution of independent models. and (3) to minimize inter-processor com- munication. To maximize parallel execution of independent models. the proposed algorithm utilizes the hierarchical structural information of models avai’able from the hierarchical model design methodology of the DEVS formalism. Through benchmark simulation experiments, we show that the proposed algorithm achieves good performance. K~~NY&Y: Partitioning/mapping; DE% formalism; Time Warp: Hierarchical simulation: Parallel/distributed discrete event simulation 1. Introduction Parallel and distributed simulai ion (PADS) has been an active research area for more than a de- * Corresponding author. Tel.: +82 53 810 2551: fax +82 53 816 1976; e-mail: kkim(&nucc.yeungnam.ac.kr. cade. As identified by previous performance stud- ies of PADS, the partitioning problem of models is one of the most important issues that may affect the performance of distributed simulation. Thus, finding the best possible partition of the model for obtaining the fastest simulation time has been the goal of many ongoing research efforts. Like most other distributed processing applications. 1383-7621/0165-6074/98/$19.00 0 1998 Elsevier Science B.V. All rights reserved PII:Sl383-7621 (97) 00057-X

Upload: ki-hyung-kim

Post on 05-Jul-2016

215 views

Category:

Documents


3 download

TRANSCRIPT

J&J *H -- R

ll!Q .

ELSEVIEF: Journal of Systems Architeclure 44 (1998) 433-455

;$h& OF

ARCHITECTURE

Hierarchical partitioning algorithm for optimistic distributed simulation of DEVS models

Ki Hyung Kim ‘.*, Tag Gon Kim ‘, Kyu Ho Park b

Abstract

The partitioning problem of models is one of the most important issues which may affect the performance of distrib- uted simulation. This paper presents a novel partitioning algorithm for the optimistic distributed simulation of hierar- chical. modular Discrete Event System Specification (DEVS) models. The proposed algorithm pursues the following three goals to achieve the overall objective of the minimum simulation time: (I) to balance the computational loads of partitions, (2) to maximize the parrllel execution of independent models. and (3) to minimize inter-processor com- munication. To maximize parallel execution of independent models. the proposed algorithm utilizes the hierarchical structural information of models avai’able from the hierarchical model design methodology of the DEVS formalism. Through benchmark simulation experiments, we show that the proposed algorithm achieves good performance.

K~~NY&Y: Partitioning/mapping; DE% formalism; Time Warp: Hierarchical simulation: Parallel/distributed discrete event simulation

1. Introduction

Parallel and distributed simulai ion (PADS) has been an active research area for more than a de-

* Corresponding author. Tel.: +82 53 810 2551: fax +82 53 816 1976; e-mail: kkim(&nucc.yeungnam.ac.kr.

cade. As identified by previous performance stud- ies of PADS, the partitioning problem of models is one of the most important issues that may affect the performance of distributed simulation. Thus, finding the best possible partition of the model for obtaining the fastest simulation time has been the goal of many ongoing research efforts. Like most other distributed processing applications.

1383-7621/0165-6074/98/$19.00 0 1998 Elsevier Science B.V. All rights reserved PII:Sl383-7621 (97) 00057-X

distributed simulation falls into the category of problems that is solved by executing processes that require communication during their lifetime rather than just at the initiation or the end. Much re- search has been conducted for devising partition- ing strategies for the problems in this category [31,3,1.5]. However, unlike problems in this catego- ry, distributed simulation has inherent causality constraint in which each simulation task must pro- cess arriving messages in their time-stamp order, not in their real-time arrival ol-der. To satisfy this constraint, a synchronization algorithm between simulation tasks is required. Diverse synchroniza- tion algorithms have been developed. They are broadly classified into two approaches based on their tolerance to causality errors: optimistic ap- proach [17,12], which can detect and resolve the conflict of causality, and conservative approach [27], which always prevent causality errors.

In PADS, model verification and validation be- come more important than in sequential simula- tion, since the target models of PADS is usually large and complex system. In this aspect, some for- mal methods for developing simulation models have been employed in distributed simulation. Dis- crete Event System Specificaton (DEVS), devel- oped by Zeigler [36]. is one such modeling formalism for discrete event systems. In the DEVS formalism, discrete event models are specified in a hierarchical, modular form. Based on the same un- derlying DEVS formalism, several important issues of simulation, such as sequential/distributed simu- lation, model verification and validation, logical analysis, model reuse, and model management, can be tackled in a unified framework [25,16,20].

The purpose of this paper is to devise a parti- tioning algorithm for the oTltimistic distributed simulation of hierarchical DEV.S models. Basically, the partitioning problem in distributed simulation belongs to the class of NP-co:nplete problems [6] if we want to utilize perfect knowledge about sim- ulation, such as the sequence of events that has to

be executed to simulate each process. the duration of each event, the precedence imposed by event messages, and an optimal schedule for execution of events. Looking at the problem practically, it is not possible to accurately predict such know- ledge as the sequence and duration of run-time sim- ulation events at compile time. In addition, there is no known polynomial algorithm to evaluate good- ness of a partitioning even if sequence and duration of simulation events could be predicted based on compile-time information. Thus, polynomial time algorithms that find a partition which is near-opti- mal based on the incomplete information at com- pile time would be the reasonable.

This paper proposes a new partitioning algo- rithm based on the simplifying assumption which adopts the following three goals to achieve the overall objective of the minimum simulation time: (1) to balance the computational loads of parti- tions, (2) to maximize the parallel execution of in- dependent models, and (3) to minimize inter- processor communication [21]. To estimate the parallelism inherent in models, the proposed algo- rithm, called HIPART, utilizes the hierarchical structural information of models. In the hierarchi- cal model design methodology of the DEVS for- malism, a model can be decomposed into a set of connected submodels [36]. Also, in the methodolo- gy. when a system is modeled, a modeler naturally partitions the system into a set of subsystems while considering parallelism between them. Thus, by utilizing such hierarchical structural information of models, the proposed algorithm can partition models while considering parallel execution of models. Through benchmark simulation experi- ments, we show that the proposed algorithm achieves good performance.

The remainder of this paper is as follows. Sec- tion 2 summarizes previous researches related to the partitioning problem in distributed simulation. Section 3 briefly describes the DEVS formalism and its distributed simulation methodologies. Sec-

K. H. Kim et cd. I JournuI of’ S~wws Architecture 44 (1998) 433 455 435

tion 4 analyzes the partitioning protllem of the op- timistic distributed simulation in general. Then, based on the analysis, we derive the above men- tioned three goals of partitioning and propose the hierarchical partitioning algorithm. Section 5 presents the performance of the proposed algo- rithm through benchmark simulation experiments. Finally, Section 6 concludes this paper.

2. Related1 works

This section describes previclus researches about the partitioning problem of models in dis- tributed simulation.

Random partitioning has been frequently sug- gested for distributed simulation. For complex graphs with dynamic behavior, it has been argued that random partitioning may give as good a per- formance as any other approach, at a much lower computational cost of the partitioning itself [32].

Many past partitioning approaches in distribut- ed logic simulation have employed objective cost functions which are basically based on the balance of two optimizing goals: maximizing load balance and minimizing communications between parti- tions [ 10,2,18,4,26]. Kapp et al. [ 18:1 have proposed objective cost functions for measuring the relative quality of a partition that includes a synchroniza- tion factor for a NULL-message-based conserva- tive protocol.

Cong et al. [lo] and Bagrodia et al. [2] proposed an ac~,clic multi-way partitioning algorithm for the distributed simulation of combinational Boolean circuits. They have shown that employing directed task grqh in which edges have direction is helpful in identifying the underlining c rcuit structure. They also have utilized the fact tl-at cyclic depen- dency among partitions on different processors can cau:se unnecessary rollbacks when optimistic simulation strategies are used. Thus, their algo- rithm, named K-AFM which is based on the

well-known FM algorithm [l 11, tries not to violate the acyclic constraint.

Smith et al. [32] proposed a partitioning strate- gy called the cone ossignttzent that tries to minimize communication as well as to maximize concurren- cy for gate level simulations. They accomplish their dual objective by introducing the concept of cones. For a gate, a cone is defined as the set of gates which are affected by the output of the gate. Following the definition of cones. gates which are driven by primary inputs are divided randomly into sets of approximately equal sizes and each set is assigned to a unique processor. Such division is done to increase concurrency of execution. The concept of cone has also been employed for the acyclic multi-way partitioning algorithm of [10,2]. They employed the tmxitnun~ funout ,fiee cone (MFFC) decomposition technique to obtain an acyclic network after partitioning.

Chawla [6] proposed a partitioning algorithm which employs the linearity frucCon for a measure of parallelism in models. The linearity fraction is defined as a fraction between 0 and 1, which is the probability that a pair of a group of processes will benefit from concurrent execution on distinct processors. More specifically, given two groups of processes. linearity fraction is computed by di- viding the number of all possible pairs of indepen- dent simulation cycles by the number of all possible pairs of simulation cycles between given groups. He represented simulation tasks by a di- rected ctcyclic gtwph (DAG) that considers each simulation cycle of a simulation task as a node in a DAG. However, unless process migration tech- niques [14] are employed, a simulation task is the minimum unit of partitioning. Moreover, DAG- based optimal partitioning is NP-complete. Thus, he just calculates the linearity fraction between simulation tasks by considering the static edges be- tween the tasks. That is, a task pair in a task graph can be classified into three categories based on the direction and number of edges between the pair:

unconnected. pipeline, and qvlic. Then, under a worst case scenario, the linearity fraction value is estimated by 0, $, and 1 for a task pair in each cat- egory, respectively.

It may be desirable to simply make an initial as- signment at compile time and dynamically adjust the computation load by changing the partition at run-time to better balance communication and com- putation loads. Such schemes have been tried in the past for distributed simulation but have not been very successful [ 141. The major reason for its failure is the cost of process migration between processors. It is too expensive to migrate a process and further- more, productivity of the move is unpredictable.

Much research has also been conducted to par- tition DEVS-based models. Conception [‘?‘I and Zeigler and Zhang [37] proposed a hierarchical de- composition approach for DEVS models to permit external events occurring in the same simulation time to be processed concurrently. Seong et al. [30] have proposed a partitioning algorithm for parallelizing internal and exter:?al events occurring in the same simulation time. However all the above approaches have been based on the synchronous synchronization mechanism; thus, the goal of their partitioning algorithms is to maximize the parallel execution of events in the same simulation time.

3. Preliminaries

In this section, the DEVS formalism and its hi- erarchical simulation mechan-sm, called abstract simulators, are described. Then, we describe an op- timistic distributed simulation scheme for DEVS models called the Distributed Optimistic Hierar- chical Simulation (DOHS) scheme.

3.1. DEVS formulisnz

The DEVS formalism provides a basis for spec- ifying discrete event models wi1.h a system theoretic

viewpoint [35,36]. The formalism specifies discrete event models in a hierarchical. modular form. Within the formalism, one must specify (1) the ba- sic models from which larger ones are built, and (2) how these models are connected together in a hierarchical fashion. Top down design resulting in hierarchically constructed models is the basic methodology in constructing models compatible with the multi-faceted modeling approach.

A basic model, called the atomic model (or atomic DEVS), specifies the dynamics of the sys- tem to be simulated. An atomic model AM is de- fined as follows:

where X is the external input events set, S the se- quential states set, Y the external output events set, 6i”(: S X { *} i S: internal transition function, where * is an internal event which notifies that the next schedule time has arrived, ci,,,: Q x X - S: external transition function. where Q is the set of the total states of M given by Q = {(s, e) / s E S and 0 < e < /a(s)}, L: S x {*} + Y: output func- tion, and ta : S + R,S,,: time advance function.

,4s with modular specifications in general, we must view the above atomic DEVS model as pos- sessing input and output ports through which all interactions with the external world are mediated. To be more specific, when external input events are arriving from other model and received on its in- put ports, the model decides how to respond to them by its external transition function. In addi- tion, when no external events arrive until the schedule time which is specified by the time ad- vance function, the model changes its state by the internal transition function and reveals itself as external output events on the output ports to be transmitted to other models. For the schedule time notice, an internal event (*) is used as shown in above definition.

Several atomic models may be coupled in the DEVS formalism to form a multi-component

K. H. Kim ,rt cd. I Jm.und of Srstems Architcvture 44 (1998) 433-455 437

model. also called a coupled model. In addition. closed under coupling, a coupled model can be represented as an equivalent atomic model. Thus, a coupled model can itself be emplloyed as a com- ponent in a larger coupled model, thereby giving rise to the construction of complex models in a hi- erarchical fashion. A coupled model CM is defined as follows:

CM = (.D, {A&}, {I,}, {Z,,j}~ SELECT)

where D is the set of component n.ames. For each i in D, M, is the DEVS for component i, 1, the set of influencees of i. For each j in I,. Z, ,: x - X, : i- to-j output translation function, and SELECT: subsets of D + D: tie-breaking function.

Detailed descriptions of the definition of the atomic and coupled DEVS can be found in [36,37].

By using the atomic and coupled DEVS, a hier- archical model can be constructlzd as shown in Fig. 1. The coupled model, No&, is a simple node model in a queuing network. Node consists of Server and Router models (with a Sewer being composed of B@izr and Proc models).

The simulation of DEVS models is based on the hierarchical simulation mechanism, also called the ubstruct simulutor principles, developed as part of the DEVS theory [36]. An abstract simulator is a

Fig. I. Queuing node model. Node.

virtual processor, or an algorithmic description, which interprets the dynamics specified by the DEVS formalism. Two types of abstract simula- tors are defined: SIMULATOR for atomic models and COORDINATOR for coupled models. For simulation, an abstract simulator is assigned to each DEVS model in a one-to-one manner; thus, abstract simulators form the same hierarchical structure as that of the models. For example, Fig. 2 shows the hierarchically structured abstract simulators of the model Node.

The operation of an abstract simulator involves handling four types of messages: (*. t), (x. I). (1~: t), and (done, tN). where t is the simulation time and tN is the next internal transition time. The (x, t) and (y.t) messages carry out external input and output event informations, respectively. The (*, t) and (done, tK) messages are used for the schedul- ing process of abstract simulators. The (*, t)

root-coordinator

simulator S:Routcr atomic-model Router

simulator S:Proc atomic-model Proc

Fig. 2. Abstracr simulators of model Node.

438 K. H. kin et al. I Journul of’S?:rtems Architecture 44 (I 998) 433 -455

message notifies a SIMULATOR that its next in- ternal transition time has arrived. The (done, tN) message is used for making a new schedule. A de- tailed description of these operations is given in [36]. We describe here the hierarchical scheduling process of abstract simulators. The process con- sists of the two phases: finding an imminent simu- lator. the simulator with the minimum next internal transition time (tN), and executing the simulator. In the finding phase, each SIMULA- TOR which has executed either an internal or ex- ternal transition function modifies its tN and sends a (done; tN) message to its parent COORDI- NATOR for making a new schl:dule. Upon receiv- ing (done, t) messages from its components, a COORDINATOR finds the imminent component with minimum tN among all its components and sends a (done,minfN) message to its parent. This hierarchical process continues until the top-most coordinator of the hierarchy of abstract simula- tors, also called the root-COORDINATOR, re- ceives a (done, tN) message. Then, the root- COORDINATOR advances the simulation time (t) to tN and begins an execution phase by issuing a (*, t) message to its imminent component. Upon receiving the (*, tN) message, a COORDINATOR passes the message to its imminent component; this message passing continues until the most im- minent SIMULATOR receives the message. Fig. 2 shows the trails of (*, t) and (done. tN) mes- sages in the hierarchical scheduling process. By re- peating these two phases, the rimulation proceeds to the next simulation step.

3.3. Optimistic distributed simulation of DEVS models

The distributed simulation of DEVS models differs from the conventional logical process-based distributed simulation [12] in ihat: (i) the formal- ism differentiates external and internal events of the models, and (ii) for simulalion of DEVS mod-

els, the hierarchical simulation mechanism is used [36]. Owing to these differences, most of distribut- ed DEVS researches have concentrated on the syn- clzronous approach [9,33,29,7,8]. Synchronous approaches use a central event scheduler to syn- chronize the simulation progress across all of the processors which are involved in the simulation. Since the simulation time is managed by the cen- tral scheduler, only events with the same simula- tion time can be parallelized. Also, the scheduler becomes a bottleneck for performance since it must synchronize itself with all processors.

Recently, an asynchronous simulation algo- rithm named the DOHS scheme, which allows each processor to have different simulation clock, was proposed [22-24,201. The DOHS scheme is a hybrid algorithm of the hierarchical simulation mechanism and the Time Warp protocol [ 17,12,13] which is one of the most common opti- mistic protocols in distributed simulation. Thus. in the scheme, hierarchically structured abstract simulators are partitioned and distributed throughout the computer nodes which are in- volved in the simulation. Each computer node has its own scheduler for scheduling of the abstract simulators mapped on it. To satisfy the causality constraint of events, the DOHS scheme employs the Time Warp protocol [ 121 in which a simulation process executes every input message as soon as it arrives. If a simulation process receives a message with a smaller time stamp (such a message is called a srraggler) than that of the previously executed message, it rolls back its simulation time to the time-stamp value of the straggler and reexecute from that point.

As shown in Fig. 3, the DOHS scheme consists of four major parts: partitioned abstract simula- tors, distributed schedulers (or node-COORDI- NATORs), the DOHS-manager, and the DOHS- queue. To employ Time Warp in the hierarchical scheduling process, the DOH S-manager and DOHS-queue are devised. The DOHS-queue is a

K. H. Kim et cd. I Journal of Systems Architrcfuw 44 (1998) 433-455 439

t DOHS-manager

g$ DOHS-queue

-

v I

Fig. 3. Global structure of the DOHS scheme.

waiting queue of input messages for execution which orders messages by its 13wn rule. The DOHS-manager controls the execution of abstracl. simulators; that is, it fetches the first message of the DOHS-queue and executes the destination ab- stract simulator of the message. In addition, to im- plement a rollback algorithm in abstract simulators, the DOHS scheme deT/elops a parallel algorithm for abstract simulators [22].

In the DOHS scheme, Time Warp’s rollback and glo’bal control mechanisms are modified to fit into the hierarchical simulation. Especially, the rollback mechanism of the DOHS scheme is

more complex than that of Time Warp because of the abstract simulator’s hierarchical scheduling process. The rollback mechanism consists of the following three phases. The first phase of a roll- back is the hierarchical schedule preemption. That is. at the instant when a straggler message arrives a node, the current hierarchical schedule of the node may be in progress. An easy way for handling this straggler is to postpone its execution until the cur- rent schedule finishes. However, this method may increase rollback overheads because it is not cer- tain that the execution of the current schedule would be a correct computation. Thus, for efficient

rollback. the DOHS scheme preempts the current schedule in progress as soon as receiving a strag- gler message. For this, a hierarchical schedule pre- emption algorithm is devised. After this preemption of the current schedule, the straggler is transmitted to its destination SIMULATOR for execution by the DOHS-manager. Then, as a second rollback phase. the destination SIMU- LATOR rolls back its state and cancels the output messages which were already sent. This second phase is basically the same as the rollback oper- ation of Time Warp. In the final phase, the rolled-back SIMULATORS in the second phase are rescheduled hierarchically Thus, the simula- tion can restart from that point of the simulation time.

4. Hierarchical partitioning algorithm for optimistic DEVS simulation

In this section, we present the proposed parti- tioning algorithm for the optimistic distributed simulation of DEVS models. Before the detailed description of the algorithm, Section 4.1 analyzes the characteristics of the parttioning problem of the optimistic distributed simulation. In addition, based on the analysis. we derive a general ap- proach for partitioning of simulation tasks. Then, Section 4.2 formulates the proposed hierarchical partitioning algorithm based on this general ap- proach.

4.1 Purtitionirzg prohIm in 0plinCstic distributed sirnulcrtio~~

Without loss of generality, we assume that the optimistic simulation algoritl m adopts the time window technique [12] which has been adopted to most optimistic approaches for the control of parallelism exploitation of models. The time win- dow is an interval in simulated time such that only

events within the time window are eligible for exe- cution. The goal of the time window is to prevent incorrect computations from propagating too far ahead into the simulated time future.

A simulation task, i, assigned in a processor performs one of the following activities during simulation. 1. True computation. c, which is the simulation

cycle that will never be involved in rollback process. True computation under optimistic distributed simulation consists of two parts: model execution time (AJ;) and state saving time is;).

2. Local synchronization, L,, which implies either one of the following three simulation cycles: (1) a simulation cycle that will eventually be rolled back (false computation. or E;). (2) a roll- back upon receipt of a straggler message (R,), and (3) an idle cycle during which no work is performed (Ii).

3. Global synchronization. G,, which manages global simulation progress ( e.g., GVT calcula- tion, flow control, and memory management etc.). Thus, the total execution time for processor i

under a partitioning P. ETr,, could be expressed as follows,

Vi: ETp, = 7; + L, + Gi

=M,+S,+F;+R,+Z;+G,.

Note that

(1)

(2)

r + L, + G, = q + L, + G, Vi, j. (3)

Speedup is defined by the execution time of the sequential simulation divided by that of the parallel one. Sequential simulation does not perform any operations related to distributed simulation specific operations, such as state saving (S), local synchro- nization (L), and global synchronization (G). Thus, in Eq. (2); the only operation performed in sequen- tial simulation is the model execution time (M),

which is unique regardless of simulal.ion methodol- ogies (distributed or sequential) ar.d partitioning algorithms. The total execution time for sequential simulation, ETs, can be expressed as follows:

(4)

where PI is the number of processors(or partitions) used in distributed simulation. Then, from Eqs. (2)-(4) speedup can be represented as follows:

CM, + s, + F; + R, + Jj + G;)

for any j. (5)

To achieve a maximum speedup, a partitioning al- gorithm ,should minimize each item in Eq. (2). In Eq. (2) 174~ is the only useful comp&ation for sim- ulation of models. Thus, we define the average model ex.ecution time, Ma,g, as

I’ >*- I \ Mayg = [ CM, ) /tn. (611

\, rdl /

where m is the number of processors. The state saving time (S) and g,obal synchroni-

zation time (G) in Eq. (2) are inevitable overheads for parallelizing the simulation. That is, since we assume that simulation algorithm is optimistic, the state saving should be performed to support the rollback operation. and global synchronization should ialso be performed to manage simulation progress in each processor. Thus, to maximize speedup, a partitioning algorithm should pursue the following goals in general. [I] Computational loads of processors (or model

execution time Ml) should be balanced

max Mi + M,,, j=O.rrl- I

[2] Local synchronization work should be mini- mized

Vi, L, = F; + R, t I, + 0. (8)

Thus, the maximum possible speedup can be ex- pressed as follows:

(Mavg -t S, + G,) for any j. (9)

The partition which enables this maximum speed- up is called the optirtzal partition.

Now, let us consider how to minimize L,. Roll- backs occur by a number of reasons. Among them are differences in event processing times and event generation rates between processors, communica- tion delays between processors, and the topology of the model (cyclic dependencies between models are particularly notorious). Ultimately, rollbacks occur because messages arrive out of chronological order at a processor. Thus, minimizing the differ- ence between local simulation times of processors will reduce the frequency of messages arriving out of chronological order. For this. a partitioning algorithm should pursue the following goals: (a) to balance the computational loads of processors, (b) to maximize the parallel execution of independent simulation tasks (that can do true computation on a processor at any time), and (c) to minimize inter- processor communication. That is. if simulation tasks are independent with each other, there is no communication between them, and rollbacks do not occur (L, becomes zero). Otherwise, simula- tion tasks communicate with each other, and this induces rollbacks or idle cycle (I,) for waiting mes- sages. The computational loads of each processor should be balanced to reduce the idle cycle of ear- ly-ended processors. Moreover, the message com- munication delay may cause additional rollbacks and wasted lookahead computation; thus the above third goal should be pursued.

To maximize the parallel execution of indepen- dent simulation tasks, a partitioning algorithm should know the run-time behavior of the simula- tion, such as the influence relation between

442

: Event execution at that simulation time

Simulator Space

SimulatorA I I * I I ! I I

SimulatorB ____ I____ 1 ._._i ,__.: ..__ I__._: _._.t ._._: .._.i . . . . . . .

SimulatorC . . . . . . . . . . . . . . T ,“‘.(.‘.. T . . . r . . . . . . . . . . . . . . . . . . . r . ..‘...‘.,.‘..‘.. .

SimulatorD I I , ,

SimulatorE ’ ’ ’ ’ . ..‘.....,.... T ,..‘,.‘. 1J.J __._ :...I...: __..: . . I . I.... I . . . . L . . . j I . . . . . . . . . j.... L . . . . j . . . . I........ , I I I, 4 I t , I I I I I I 80 1 I t I I I I I I I1 I I I I , I I, I, I I I I I I

* 100 200 Simulation Time

4 b

size of time window

Fig. 4. Parallelism in simulation tasks.

simulation cycles. For example, Fig. 4 shows the parallelism of simulation tasks. Let us assume that all simulators have the same computational load (that is, they hold the same portion in the execu- tion time of sequential simulation). Even though all simulators have the same computational load, only those simulators without having influence re- lation with each other can be executed in parallel at any simulation time. Simulators A and R have an influence relation around simulation time 100. Thus, they cannot be executed in parallel at that simulation time. Moreover, since the time window technique was assumed, only simulators having similar local simulation times (more specifically. simulation times in the same time window) can be simulated in parallel. Thus, if we partition sim-

ulators into two groups (A, C, E) and (B. D) and assign each group into a distinct processor, we cannot exploit parallelism of models at all.

However, to estimate such run-time behavior at compile time is extremely difficult. Moreover, even though we know the run-time behavior of the model for one input data set, if we change input data set, the run-time behavior changes. Thus. to estimate such parallelism in models. we utilize the hierarchical structural information inherent in the hierarchical model design methodology of the DEVS formalism. In the methodology, when a system is modeled, a modeler naturally partitions the system into a set of subsystems while consider- ing parallelism between them. Section 4.2 details this approach.

4.2 Pwfirbzing using hirrmchicnl strwcturul injbrrnution

The hierarchical model design methodology is increasingly being recognized as a predominant modeling paradigm for future simulation develop- ments. due to its advantages of such capability as: reduction in model development time, support for reuse of a database of models, and aid in model verification and validation [28,7,36,:!2]. In the hier- archical dlesign methodology, a mc’del can be de- composed into a set of connected submodels [36]. Also, in the methodology, when a system is mod- eled, a modeler naturally partitions the system into a set of subsystems while considering parallelism between them. This hierarchical dl:sign approach also appears in the VLSI design. The number of external :signals to/from a chip is only a few per- cent of the internal signals of the chip. For exam- ple, Fig. 5 shows this parallelism exposed in the hierarchical design. The system S communicates with extl:rnal world only via input and output ports. Communication between components A and B occurs locally inside of the system S.

We propose a new partitioning algorithm which pursues {the three goals mentioned in Section 4.1 to achieve: (1) to balance the computational loads of partitions, (2) to maximize the parallel execution of independent models, and (3) to minimize inter- processor communication. For the second goal, the proposed algorithm utilizes the hierarchical structural information of models available in the hierarchical design methodology. That is. each sub- system of the system is assumed to have locality of communication, and the algorithm exploits this communication locality of hiemrchical models. Thus, the basic strategy of the proposed algorithm is to insert larger component in the same partition if possible (i.e., to partition a hierarchical composi- tion tree at the higher level if possible). After fol- lowing this basic strategy, the algorithm tries to find a partition which satisfies the other two goals.

I , -

____...------ .__.._~

_ , , * component B ---..._ ,_-’

,’ .I., ,.I’ ~. *,

K

L

( BJ

: M :, ‘,‘... : .-

,,,’

: -., ,;’ : .-_ _’ ’

-.__ ___’ ,’

Component i ,.-e6~~bX~ilt-B,, ,,1’

-n o-g p-$ /g ;,,,,, /”

r Input ports . . -._ ,I ’ Output ports . . . . ..--I’

component c

bl-

System S

Fig. 5. Hierarchical design of models.

Since the algorithm is based on the distributed simulation of DEVS models, the target of the task graph is hierarchically structured abstract simula- tors. Thus. for partitioning. we transform the com- position tree of abstract simulators into a weighted task tree which is defined as follows.

Definition 1. A task tree is a tuple G = (V. E, C, I”), where V = {nj, j = 1 : c), 2: = IV1 is the set of nodes, E = {pi,, = (IZ!. n,)} is the set of communi- cation edges, C is the set of edge communication costs, and T is the set of node costs.

For describing task trees, the following nota- tions are employed. If e,,, is in E, then FQ is called the pclrent of n,, and n, is a child of Hi. If there is a path from n, to ni and n, # ni, then n; is the un- cestor of n, and n, is a descendant of ~1,. A node with no descendants is called a 1eujJ: Note that leaf nodes and non-leaf nodes correspond to SIMU- LATORS and COORDINATORS in the composi-

444 K. H. Kim et trl. I Journal qf S~~sterns Adzirecfure 44 (I 998) 433~ 455

tion tree of abstract simulators, respectively. The suhtree of a node ni is a tree consisting of the de- scendants of Hi including n, itself which are all mapped in the same partition. In that case, the node Iii is called the root of the subtree. The depth oj’~ node ni in a task tree is the, length of the path from the root of the task tree to nj.

The value c’,,, E C is the communication cost in- curred along the edge e,,, E E, which is zero if both nodes are mapped in the same partition. The cost z, E T of a node n, consists of .:wo weights (p,..si), where p, is the computation cost of IZ, and si is the sum cost of the subtree of tii. The computation cost p, of anode n, in a task tree is the model execution time taken for all kinds of input messages during sequen- tial simulation. That is. the cost does not include dis- tributed simulation overheads (1, + G). The cost can be expressed in our previous notation as follows:

p, = iv,.

The computation cost of a node does not readi- ly depend on the size of the node’s associated mod- el since some nodes would be more often executed and the cost should incorporate such information. Rather, it depends on the probability of receiving input events and the computational complexity for each input event.

The sum cost si of a subtree G, is the sum of computation and communication costs in G,, as shown in the following:

‘, = c Pk + c (ck., + c,,.k). iii EG, l?L iG,

(11)

The meaning of node and edge costs can be bet- ter understood by comparing the following ex- treme cases. If there is full Tarallelism between partitions (that is, partitions are independent with each other), the sum cost of the root node of a sub- tree (or partition) is the execution time of the sub- tree. Thus, the total execution time ETp of the distributed simulation under partitions P can be determined as follows:

ETP = j=~~~, (Sj + sj + Gi) 1 (12)

where Sj is the state saving time, and G, is the glob- al synchronization time. That is, distributed simu- lation can remove most of distributed simulation overheads such as false computation time (F,), rollback time (R,). and idle time (Ii).

In contrast, if there is no parallelism between partitions at all (that is, partitions are fully depen- dent on each other). ETr is the sum of the execu- tion time of each partition, as shown in the following:

E-I-p= c (s,+S,+Fj+R,+G;). (13) j=O:nl+ I

That is, the execution time ET, of a partition i is S; + S; + F; + Ki + G, + Ii from Eq. (2), where 1, = ETp - (s, + S, + F, + Ri + Gi). Therefore, these two extreme cases become the upper and lower bounds of the simulation completion time, respectively.

Now, we can represent the partitioning prob- lem by using the task tree. That is. the partitioning is a mapping of the nodes of a task tree G onto maximum nz partitions. More specifically, the problem is to determine a mapping

map(n,) = K,, j = 1 : 1’ and i=O: (M- 1) (14)

for the nodes(+) of G onto M( < m) partitions Ko, K,. , K,,.,+, , with the following objective func- tion

Note that our problem specifies just the maximum number of partitions, n?. Thus, the resulting num- ber of partitions, M, may be smaller than m. The final goal of partitioning is to minimize the total simulation time. not to fully utilize the given num- ber of processors. Depending on the characteristics

of simulation models, these two goals do not al- ways match up. That is, in distributed simulation, using more processors for simulation does not im- ply smaller simulation time.

with local memory and message buffers. The communication protocol is via asynchronous message passing. Fig. 6 shows the proposed partitioning algo-

Since the proposed algorithm assumes the DOHS scheme as the underlying simulation algo- rithm, we make the following assumptions for a partitioning algorithm. l For a non-leaf node in a task tree, at least one

descendant should be in the same partition. This is the requirement of the distributed hierarchical simulation. That is, due to the characteristics of the distributed hierarchical simulation mecha- nism(the DOHS scheme). a non-leaf node (which corresponds a COORDINATOR in ab- stract simulators tree) cannot be scheduled without having leaf nodes (which correspond SIMULATORS in abstract simulators tree) in the same partition.

l The atrchitecture of a parallel computer is a completely connected network of processors

rithm. Since the task graph is a tree. the algorithm is of a recursive form. The algorithm starts at the root node of a task tree. When the algorithm e?ztrrs a node, it checks whether there are children whose sum cost is greater than the average cost of parti- tions(l,,,). If there is such a child, the child is too big to be-inserted into one partition and should be partitioned more. Thus, the algorithm enters the child to partition it. After this recursive process. the algorithm finds a node having children whose sum costs are all smaller than L,,,. In this case, we call that the algorithm cisits the node. When the algorithm visits a node, it finds a partition with the minimum sum cost among the children by the following process. The number of possible parti- tions is 2“ - 1. where C is the number of children. When the number of children is small, solution

Algorithm 1 HIPART D hitidy, Lavg = (s,,/m + (CCc,,)/jEl), where IEl is the number of edges in E.

D Check if there are ;:mminen\ ihildren r&s whose loads are greater than LnUg. while ( 3 child n, and s, > L,,,(l -1 a) ) do

n, zs too big to be inserted into one partition and should be partitioned moTe. Hipart(

end while D Now? n, has only l.hose chzldren whose loads are smaller than Laag(l + a) while ( s, > L,,,(l + cx) and number of partitions already made < m, ) do nlake a partition P ror sat,isfying the objective function

H = min I(2 c,,; + n ,& tXS”J - Lw I

, where K is the set of children of n,; D Update L,,,,.

*After pa&itioning, update the computation and communication costs of the parents of n,, including n, itself;

end while

Fig. 6. Hierarchical partitioning algorithm

space can be fully searched. In fact. this is a nor- mal case since average number of children in hier- archical model design methcldology is usually small. If this kind of full search is intractable. some efficient heuristics such as Kernighan and Lin’s al- gorithm [19], Fiduccia and Mattheyses’s algorithm [ 111, can be used for obtaining a suboptimal solu- tion in a polynomial time.

For example, consider a task tree in Fig. 7. Fol- lowing the definition of the task tree, each node nj and edge e,, in the task tree has a cost of the form @j> s,) and ci,j, respectively. Note that, before par- titioning, the sum costs of a task tree do not in- elude communication costs of edges. From the definition, the communication cost of an edge in the same partition is zero. Only such edges lying between distinct partitions have non-zero values.

Lavg = s,/m = 1300/3 = 433.

The bounds of the partition which can be ac- cepted by the algorithm are La& 1 f x), where La,g~ is the maximum allowable cost mismatch between partitions (for example, it can be set to the largest value of node computation costs in the task tree). The larger value of r allows the larger load difference between partitions. This value will be updated during partitioning process. The algorithm, beginning at the root node n,, partitions the task tree through the following steps.

(1)’

We want to partition the t,lsk tree into three partitions. At the initialization phase, the algo- rithm calculates the initial estimate of the average cost of each partition, Lavg, as follows:

(2)

Entering the node n,,, the algorithm finds three children n/,, n,. and n,. The sum cost ,s, of IZj is 760 (> 433( 1 + a)), which is too large to be in- serted into one partition. Thus, the algorithm enters node nj to make a partition consisting of only part of the children of nj (Fig. 8). Entering the node n,, the algorithm finds that the sum cost s, of node n, is 540

Fig. 7. An example task tree T

K.H. Kim et 71. I Journul of’Sysrenu Arrhifrctuw 44 (1998) 433455 441

(> 433( 1 + ‘x)) which is still too large to be in- serted into one partition. Thus, the algorithm enters node n,, again.

(3) Entering the node n,, the algorithm finds that all children have smaller costs than Lnvg. Now the algorithm visits node n, and tries to make a partition consisting of some children of II, (Fig. 13). Three partitions may bf: possible: first one before node n,, second one before n,. and third one before qI (Fig. 8). The sum cost of each partition is 320, 210, and 540, respective- ly. The algorithm selects a partition P which minimizes the difference between the sum costs and Lavg.

HP = min((((2. 20) t 320) - 45.31,

I((2 20) + 210) - 4531,

l((2.20) + 540) - 4530.

(4)

(5)

Through this process, the third partition is se- lected with the sum cost 540 (Fig. 9). Note that after the partition is made, the sum cost of this partition now should incorporate the commu- nication cost of c,,~, thus resulting 560 (i.e, 540 + 20). Now, the algorithm calculates a new value of c1 which is the maximum difference be- tween Lavg and the sum costs of the partitions already made. Then, as a final step of a visit to node n,,, the algorithm recalculates node com- putation costs from the node y1,, to the root of the task tree. Reentering the node nj. the algorithm identi- fies that the sum cost of /zi is smaller than Lavg and returns again to node II,. Reentering the node n,,, the algorithm finds that all children have smaller costs than Lavg. Thus, the algorithm tries to obtain the closest

Fig. 8. Partitioning process (a) for a task tree T.

448 K. H. Kim et al. I Journd (!f Sysfen~s Architt~c.ture 44 (I 998~ 433-455

ylb.2 240=210+10+20

1tix1* > - 2 20 <TOO 100,

/+Jh

<lO P 4

I 20/ \20

QQIW/ Visiting node n,

Fig. 9. Partitioning process (b) fhr a task tree T.

partition to Lavg among children, resulting a partition before node tz, (Fig. 10).

(6) Now, all remaining nodes are included in the last partition (Fig. 11).

4.3. Calculation of the costs of nodes and edges

Until now, we have assumed that the costs of nodes and edges are known before partitioning. This assumption may be reasonable if the trace re- sult of a sequential simulation is known before partitioning. Even if such a full trace result is not available, a kind of presimulation may be used for the estimation of the costs. Chamberlain and Henderson [5] investigated the appropriateness of presimulation as a data gathering technique. They showed that the functional evaluation frequency measured during the first 10% of the simulation run was an excellent prediction of the evaluation frequency for the last 90% of the run. Thus, if such a presimulation is possible, thl: costs of nodes and

edges can be obtained. If even such a presimula- tion is not possible, the only possible way to esti- mate node and edge costs may be to utilize the size of models.

5. Performance results

In this section. we evaluate the proposed hierar- chical partitioning algorithm by benchmark simu- lation experiments. We chose the Columbian Health Care System (CHCS) queuing benchmark model. This model has been used to analyze the performance of various researches [ 1,29.34]. The CHCS model is a multi-tiered health care system consisting of villages and health centers. There is one health center for each village. When villagers become ill. they travel to their local health center for assessment and are treated when possible. If they cannot be treated locally, patients are referred up the hierarchy to a next health center, where the

449

:1oTyjp5> x

20 20

IO>

A (PJq "V

\:a)

~100.100>~100.100>

20 20

r s

1w,100> <100.100>

Re-visiting node na -

Pi,;. 10. Partitioning process (ci for a task tree T.

+ 470=210+240+10+10 Visi

<IO. 2,0/q

b

2L

20 20

c d

<100.100> <100.10

I 24 \20 / w Visiting node na

Fig. I1 Partitioning process Cd) for d task tree T

450 K. H. Kim ef cd. I Jourr~d c$ Systems Arclzitecturr 44 (I 998) 433455

assessment/treatment/referral process is repeated. Upon arriving at a health center, a patient is en- queued until one of the health care workers be- comes available. It is assumelj that patients can always be treated at the top-1:vel hospital of the health care system. Also, pat..ents do not return to their home village after treatment (i.e., the queuing network is Open).

The DEVS model of CHCS was designed after the hierarchical modeling methodology. The top- level coupled model, CHCS, has two second-level coupled models, Large-HCSI and Large-HCSZ. Then, each Large-HCS is decomposed into eight Medium-HCS’s each of which is also decomposed into eight Small-HCS’s. Thus, CHCS model can be constructed hierarchically. Each HCS coupled model in all levels also has two atomic models, Hospital and Village. Each Village model becomes a source of patients, generatin,g a fixed number of patients independently. Each Hospital model be- comes a server for patients w;th a waiting queue and a set of doctors. It is assumed that each first/ second/third/fourth-level Hospital has 16/4/4/l doctors, respectively. Fig. 12 shows the resulting CHCS model.

The next step is the mod9 partitioning and mapping for distributed simulation. The composi- tion tree of the CHCS model is shown in Fig. 13(a). The tree is partitioned by the proposed hier- archical partitioning algorithm while varying the number of partitions (2,4,8,16, and 32). The parti- tion results are also shown in Figs. 13 and 14.

Fig. 12. Hierarchical construction of the CHCS model.

To show the effectiveness of the partition result, we performed simulation experiments. For these experiments, we used D-DEVSim++ [20], a real- ization of the DOHS scheme. Simulation experi- ments were performed on KPJCUBE860, a five- dimensional hypercube paralle machine developed at KAIST. Each node of KAICUBE860 has a 40 MHz i860 microprocessor, 8 Mbyte main memory, and five communication channels which employ the store-and-forward routing scheme. Each node

runs a CORE kernel which is a stripped-down ver- sion of UNIX kernel. The kernel supports only one user process and has basic memory manage- ment and communication primitives.

The CHCS benchmark model was simulated while varying the following three parameters. The first parameter is the referral rate of a hospi- tal. The referral rate represents the relative per- centage of referred patients in total patients at each hospital. If it is 0 ((%), all patients are treated

K.H. Kim et ul. I Journcrl qf’SJ.vems Architecture 44 (19%) 433-455 451

Fig. 13. Partition results for the CHCS model,

locally and are not referred up to the upper level hospital. If it is 100 (%I:,), all patient:; entering a hos- pital are referred up to the upper level hospital. Thus, the model has the maximum parallelism at rate 0, since no message passing occurs between different level hospitals. In this experiment. the rate was varied from 0 to 40 (“/u). The second pa- rameter is the number of patients generated from each village. As the number of patients becomes larger, the number of messages to be processed in each node increases. The last parameter is the insertion of artificial delays - spin-loops to in- crease the time spent on processing each event. Since the CHCS model is too simple, event pro-

Fig. 14. Partition results for the CHCS model(continued)

cessing granularity is too small for obtaining en- ough parallelism. We conducted experiments with and without artificial delays. The size of the artifi- cial delay used in the experiment is about 30 ms.

Fig. 15 shows the obtained speedups. The speedup is defined as the execution time of the se- quential implementation divided by that of the parallel one. Note that the sequential implementa- tion does not perform any operation related to global synchronization, such as global simulation time calculation, fossil collection. state saving,

16

8

4 -g 5;

4

I 2 4 8 16 32 I 2 4 8 I6 32 Number 01’ Nwwes Numb of nodes

Fig. 15. Speedup curves vs. #nodes while varying #patients Fig. 16. Speedups vs. #nodes while varying referral rates for all from each village. curves, the number of patients was set to 1200).

etc. The curve with spin-loops shows the best over- all performance because the extra parallel process- ing overhead becomes negligible compared with the event processing time. The speedup curves without spin-loops show some declines when the number of nodes is 2, 16 and 32. The number of patients also represents the message density (the message population divided by the number of nodes). When the number of nodes is 2 and 4, the number of patients does not affect the perfor- mance. However, as the number of nodes increas- es, the speedup curves with larger numbers of patients show better performance. This is because the waiting time due to the unbalanced load distri- bution becomes smaller compared with the time for processing increased patients. Note that most early-ended nodes should wait until the simulation in the latest node is done.

Fig. 16 shows the speedups as a function of the referral rate of each hospital. When the referral rate is 0 (%I), the figure shows the maximum speed- up because no message passing occurs between dif- ferent levels of the health care system. As the rate

decreases, the curves show a worse performance because more message passing occurs between nodes, and thereby the number of rollbacks in- creases.

Fig. 17 depicts the efficiency performance while varying the referral rate of hospitals. Also, Fig. 18

I

0.X

Fig. 17. Efficiency vs. #nodes while varying referral rate.

Fig. 18. Average rollback distance vs. # rlodes while varying referral rdte for all curxs, the number of patients was set to 1700).

shows the average rollback distant?. In this bench- mark, the average distance does not exceed three events.

All the above results show that the proposed partitioning algorithm can estirnate accurately the concurrent execution (or parallelism) of mod- els by utilizing the hierarchical structural informa- tion of DEVS models.

For detailed description of the experimental re- sults: refer to [24].

6. Conclusion

The partitioning problem of rnodels is one of the most important issues whick may affect the performance of distributed simuktion. This paper has presented a novel partitioning algorithm for the optimistic distributed simulation of hierarchi- cal, modular DEVS models. The, proposed algo- rithm .adopted the following three goals to achieve the overall objective of minimum simula- tion time: (1) to balance the cox,putational loads of partitions, (2) to maximize the parallel execu-

tion of independent models, and (3) to minimize inter-processor communication. To maximize par- allel execution of models, the proposed algorithm utilized the hierarchical structural information of models available from the hierarchical model de- sign methodology of the DEVS formalism. We de- rived the proposed algorithm through general analysis of the partitioning problem in the optimis- tic distributed simulation. Through an example partitioning process of a model. we showed how the algorithm works. Finally, to show the perfor- mance of the proposed algorithm, benchmark ex- periments were performed. The result showed that the algorithm can estimate accurately the con- current execution of models by utilizing the hierar- chical structural information of DEVS models.

As described in the experiments, the CHCS model used in the experiments is an open queuing network (i.e., no feedback of messages). Applying the partitioning algorithm to more various appli- cations (which include feedback) remains as a fur- ther work.

Acknowledgement

This research was supported by the Yeungnam University Research Grants and Korea ministry of Information and communication through Funda- mental Research Funds for University.

References

[I] D. Baezner, G. Lomow, B.W. Unger. Sim++: the transi-

tion to distributed simulation. In: Proceedings of the SCS Multiconference on Distributed Simulation. Simulation Series. 1990.

[2] R. Bagrodia. Yu an Chen. V. Jha. N. Sonpar. Parallel gate-level circuit simulation on shared memory architec- tures, in: Proceedings of the 1995 Workshop on Parallel and Distributed Simulation, Lake Placid. New York. 1995. pp. 170-174.

454 K. H. Kim et al. I Joumul of’S~.stem.~ Architecture 44 (199%) 433455

[3] S.H. Bokhari, Partitioning problens in parallel. pipelined, and distributed computing, IEEE. Transactions on Com- puters 37 (1988) 48-57.

[4] A. Boukerche. C. Tropper, A static partitioning and mapping algorithm for conservative parallel simulations.

in: Proceedings of the 1994 Workshop on Parallel and Distributed Simulation. Edinburgh, Scotland, UK, 1994. pp. 164172.

[5] R.D. Chamberlain, C.D. Henderson, Evaluating the use of pre-simulation in VLSI c rcuit partitioning, in: Proceedings of the 8th Workshop on Parallel and Distrib- uted Simulation, 1994, pp. 139-146.

[6] P. Chawla, Assignment Strategil:s for Parallel Discrete Event Simulation of Digital Systems. Ph.D. Thesis, IJniversity of Cincinnati, 1994.

[7] A.C. Chow, B.P. Zeigler, Parallel DEVS: A parallel, hierarchical. modular modeling formalism. in: Proceedings

of the 1994 Winter Simulation Conference. Orlando, Florida. 1994.

[8] A.C. Chow, B.P. Zeigler, D.H. Kim, Abstract simulator for the parallel DEVS formalism, in: AI, Simulation, and

Planning in High Autonomy Systems, IEEE Computer Society Press, Gainesville, Florida, 1994.

[9] A.I. Conception, A hierarchical computer architecture for

distributed simulation, IEEE Transactions on Computers 38 (2) (1989) 31 l-319.

[lo] J. Cong. Z. Li. R. Bagrodia, Acy:lic multi way partition- ing of boolean networks. in: 3lst ACM/IEEE Design

Automation Conference. 1994, pp. 670-675. [I I] C.M. Fiduccia, R.M. Mattheyses. A linear-time heuristic

for improving network partitioning, in: Proceedings of the 19th Design Automation Conference, 1982, pp. 175~~182.

[12] R.M. Fujimoto, Optimistic approaches to parallel discrete event simulation, Transactions of the Society for Com- puter Simulation. 7 (2) (1990) 15~191.

[13] R.M. Fujimoto, Parallel discrete event simulation, Com- munication ACM 33 (10) (1990) .33-53.

[14] D.W. Glazer. C. Tropper. On process migration and load balancing in time warp, IEEE Transactions on Parallel and Distributed Systems, 4 (3) (1’>93) 3 I S-327

[15] E.K. Haddad, Partitioned load allocation for minimum

parallel processing execution time, International Confer- ence on Parallel Processing, 1989 pp. 1922199.

[16] G.P. Hong, T.G. Kim. A frame work for verifying discrete event models within a devs-based system development methodology, Transactions of the Society for Computer

Simulation International 13 (1) (I 996) 19-34. [17] D.R. Jefferson, Virtual time, .ACM Transactions on

Programming Languages and Systems 7 (3) (1985) 404- 425.

[ 181 K.L. Kapp, T.C. Hartrum, T.S. Wailes, An improved cost function for static partitioning of parallel circuit simula- tions using a conservative synchronization protocol, in: Proceedings of the 1995 Workshop on Parallel and Distributed Simulation, Lake Placid, New York, July 1995 pp. 78 -85.

[I91 B.W. Kernighan. S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal 49 (1970) 291-307.

[20] K.H. Kim, Distributed simulation methodology based on system theoretic formalism: an asynchronous approach, Ph.D Thesis. Korea Advanced Institute of Science and Technology, 1996.

[21] K.H. Kim. T.G. Kim, K.H. Park, A concurrency preserv- ing partitioning algorithm for parallel simulation of

hierarchical. modular discrete event models, in: Proceed- ings of High Performance Computing Asia 1997, Seoul, Korea, 1997.

[22] K.H.Kim. Y .R. Seong. T.G. Kim. K.H. Park, Distributed optimistic simulation of hierarchical DEVS models, in:

Proceedings of the 1995 Summer Simulation Conference, Ottawa, Canada, 1995. pp 32237.

[23] K.H. Kim, Y.R. Seong, T.G. Kim. K.H. Park, Ordering of simultaneous events in distributed DEVS simulation.

Journal Simulation Practice and Theory 5 (3) (1997) 253-m268.

[24] K.H. Kim, Y.R. Seong, T.G. Kim, K.H. Park, Distributed simulation of hierarchical DEVS models: hierarchical scheduling locally and time warp globally. Transactions

of the Society for Computer Simulation 13 (3) (1997). [25] T.G. Kim, DEVS formalism: reusable model specification

in an object-oriented framework, International Journal in Computer Simulation 5 (4) (1995) 397416.

[26] P. Konas, P.C. Yew, Partitioning for synchronous parallel simulation, in: Proceedings of the 1995 Workshop on

Parallel and Distributed Simulation Lake Placid. New York. 1995, pp. 181-184.

[27] J. Misra, Distributed discrete-event simulation, ACM

Computing Surveys 18 (1) (1986) 39--65. [28] R. Sargent, Hierarchical modeling for discrete event

simulation (panel). in: Proceedings of the 1993 Winter Simulation Conference, Los Angeles. CA, 1993, p. 569.

[29] Y.R. Seong, S.H. Jung. T.G. Kim, K.H. Park, Parallel simulation of hierarchical modular DEVS models: a modified Time Warp approach, International Journal in

Computer Simulation 5 (3) (1995) 263-285. [30] Y.R. Seong, T.G. Kim, K.H. Park. Mapping hierarchical

modular discrete event models in a hypercube muhicom- puter, Simulation Practice and Theory, 1995, pp. 257-

275.

K.H. Kim et d. I Journal of Systams Architecture 44 (1998) 433--45.5 455

[31] C.C. Shen, W.H. Tsai, A graph matching approach to optimal task assignment in distributed ccamputing systems

using a minimax criterion. IEEE Tramactions on Com- puters 34 (3) (1985) 197-203.

[32] S.P. Smith, M.R. Mercer. B. Underwoc~d, An analysis of several :approaches to circuit partitioning for parallel logic

simulation. in: International Conference on Computer Design. 1987, pp. 664667

[33] Y.H. Wang. Discrete-evaent simulatior on a massively parallel computer. Ph.DThesis, University of Arizona, 1992.

[34] B.W. Unger, Distributed simulation, in: M. Abrams, P. Haigh, J. Comfort (Eds.), Proceedings of the 1988 Winter

Simulaiion Conference. 1988.

[35] B.P. Zeigler, Theory of Modelling and Simulation, John Wiley. New York. 1976.

[36] B.P. Zeigler, Muhifdcetted modelling and discrete event simulation, Academic Press. New York, 1984.

[37] B.P. Zeigler, G. Zhang, Mapping hierarchical discrete event models to multiprocessor systems: Concepts, algo-

rithm. and simulation, Journal Parallel and Distributed Computing IO (3) (1990) 271-281.

Kyu Ho Park received the B.S. degree in Electronics Engineering from Seoul National University, in 1973: the M.S. in Electrical Engineering from the Korea Advanced Institute of Science and Technology (KAIST) in 1975, and the Dr. Ing. in Electrical Engineering from the Universitc: de Paris in 1983. He joined the faculty of the Depart- ment of Electrical Engineering. KAIST in 19X3. The main focus of his research interest has been to develop new com- ~ .^ ? puter architectures and he developed 2..

Ki Hyung Kim receiT.ed his B.S. degree in Electronic Communication Engi- neering from Hanyang University in 1990 and his M.E. and Ph.D. degrees in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST), in 1992 and 1996, respectively. He joined the fac- ulty of Department of Computer En- gineering. Yeungnan university, Korea in 1997. His research interests include distributed simulation, operating sys- tem, and multimedia systems.

GFLOPS Irarallel computer KAICUBEiHanbit in 1995. He has published more than 70 papers in international journals and conference proceedings. His current research interests are par- allel simulation method for KAICUBE an’3 new parallel com- puter architectures. Actually he is develoaing a new parallel computer based on PCs. He is a member of IEEE, KISS, KITE and IEICE).

Tag Con Kim received B.S. and MS. in Electronics Engineering from Pu- san National University and Ky- ungpook National University. Korea, in 1975 and 1980, respec- tively. He received his Ph.D. degree in Computer Engineering from Uni- versity of Arizona, Tucson. AZ in 1988. From 1989 to 1991. he was an Assistant Professor in Department of Electrical and Computer Engineer- ing, University of Kansas. Lawrence. KS. Since September 1991. he has

been an Assistant and then Associate Professor in Department of Electrical Engineering at KAIST. His research interests in- clude discrete event systems modeling/simulation. computer systems analysis and software engineering methodology. He is an associate editor of sev,eral international journals in the simulation area. including Interm~tiorwl Jowmd in Cornpurer Sinrukrtion, SIMULA TION. Twwc.tion.c of‘ T/w Sorirt>~ jhr Corrtputu Sinwlation It~twnutiorwl, and Sirnultrtiwz Digesf. He is also an editorial board member of It~temutioncrl Journd it1 h- tdligcwt (‘ontrol anil SJ,.sfe~rts. He is a senior member of IEEE. and a member of ACM. AAAI. SCS and Eta Kappa Nu.