simulation of distributed processing networks

Simulation of Distributed Processing Networks

15

M. T S U C H I Y A and Douglas C. S H A N N O N TR W, Defense Systems Group, Redondo Beach, California CA 90278, USA

A discrete event simulator named DDPSIM for distributed processing networks is presented. The simulator which was initially developed to support the design of a distributed processing system for an air defense application, has continued to evolve to accommodate a more general class of bus-connected networks in which software modules, or tasks, are distributed. It has also been used to model a complex air traffic control application which involved multiple networks. The modular organization and various distributed network compo- nent models of the simulator are described in detail. The simulator provides user friendly interfaces that include simple input conventions for system definition, and tabular and graphic outputs for comprehensive analyses of the simulation results. A relatively complex distributed processing system for air traffic control is used as an example to illustrate the simulation procedure and to show the simulation results. To conclude, the lessons learned from the experience in simulation arc discussed.

Keywords." bus, CPU, distributed processing network, network interface, real time system, queue, simulation

M. Tsuchiya received the B.S. degree in management information systems from Konan University, Kobe, Japan, and the Ph.D. degree in Computer Sci- ences from the University of Texas at Austin, Texas.

He is currently with TRW Inc., Re- dondo Beach, CA, and teaches at the University of Southern California. Prior to joining TRW, he was presi- dent of Computer Progress Unlimited, Honolulu, HI, and a faculty member

~ ~ at the University of Hawaii at Manoa, the University of California, Irvine, and Northwestern Univer- sity, Evanston, 1L. In 1975, he was a visiting computer scientist at Aarhus Universitet, Aarhus, Denmark, and, in 1972, he was a visiting lecturer at Konan University, Kobe, Japan. He is a Distinguished Visitor of the IEEE Computer Society. His research interests include computer architecture, distributed processing systems, and database systems.

North-Holland Computer Networks and ISDN Systems 11 (1986) 15-27

1. Introduction

Distributed processing is a technology that brings together the rapid advances in integrated circuit technology, computer communications and concurrent processing. It is an inherently complex system whose performance is dictated by a number of factors such as network topology, processor throughput, communication bandwidth, task allocation, degree of hardware and software re- dundancy, application scenarios, to name only a few. These factors interact and influence each other, thereby making design of distributed computing systems extremely difficult and challenging [1]. In order to analyze the design of a distributed processing network and the effects of design alternatives, simulation is indispensible. A large body of literature is available for a related subject; namely, the simulation of local area networks [2-4] but only a few discrete event simulators for distributed processing systems have been reported [5-9]. Chandy et al. [5,6] discuss a distributed simulation of message switched network in which all processors cooperate to solve a common prob- lem. They also prove the deadlock avoidance and correctness of the distributed system. A special purpose model for assisting the design of a distributed processing system based on the Customer Information Control System/Virtual Storage (CICS/VS) was developed at IBM [7]. A simulation of a distributed processing system based on a local area network was reported [8]. but provides little insight into the technical issues involved. A

Douglas C. Shannon received his B.S. and M.S. degrees in Systems Engineer- ing from Southern Methodist Univer- sity, Dallas, Texas in 1972.

He is a Senior Engineer at TRW Defense Systems Group, Redondo Beach, CA, where he has worked on a number of projects involving military defense systems since 1976. Prior to TRW, he served in the Air Force at the Space and Missile System Organi- zation in Redondo Beach, CA. His research interests include distributed

processing, simulation and performance evaluation.

0376-5075/86/$3.50 ~) 1986. Elsevier Science Publishers B.V. (North-Holland)

16 M. Tsuchiya, D.C. Shannon / Simulation

very extensive, general purpose network simulator called ASSIST has also been developed at TRW [9]. It is capable of modeling a hierarchy of distributed network components in various details and has been used to support the design of the distributed processing system for a space defense system.

The distributed processing network simulator DDPSIM presented in this paper is the result of extensive development efforts over three years supporting system design and performance analyses of distributed systems. Since its initial developn~ent, DDPSIM has evolved continuously, expanding and refining its capability to simulate a wide range of network components and architec- tures. In an air defense application, for example, it simulated over forty software and eighty hardware components across six connecting networks. For an application of this complexity, system analysis and design tradeoffs by engineering intuition are extremely difficult and often inaccurate so that simulation becomes indispensable for effective system engineering. Use of the simulator has con- firmed that task allocation, load balance and communications as well as distributed hardware architecture are some of the most critical design parameters that impact the overall system performance and reliability [10,11]. In the sections that follow, DDPSIM is described in some detail and its simulation procedure and target model representation are illustrated using, as an example, a distributed processing network for air traffic control.

2. Description of Simulator

DDPSIM has been implemented on the VAX 11/780 in extended FORTRAN and SALSIM (System Analysis Language for SIMulation), a TRW simulation language written in FORTRAN. It is an open-ended simulator in that it has no real limitation in the numbers and types of components it simulates. Unique features include the following: 1. It simulates various levels of complexity from

the top level network system down to the individual software segments and hardware module level.

2. It simulates single processors, heterogenous networks and multiple bus organizations. It simulates a network of multiple networks.

3. It is capable of incorporating various bus arbi-

tration schemes. The bus model simulates communication for point-to-point, broadcast and multiple, select destinations.

4. It simulates periodic and data-enabled execution of tasks at multiple priority levels.

5. It contains hardware and software model libraries holding generic models available for rapid use of DDPSIM. The user has the flexibil- ity to augment the library with user-generated, detailed hardware and software models.

6. The architecture definition allows easy selec- tion/deselection of hardware and software models, reallocation of software models to hardware devices and assignment of hardware model attributes like processing speed. DDPSIM has been developed as a system en-

gineering tool for the design of distributed processing networks. The user may specify his target comprehensively and may make modifica- tions to the system definition simply without detailed knowledge of the simulator structure. Ease of modification allows the user to evaluate several design alternatives for tradeoff studies. Addition- ally, to facilitate comprehensive analyses of the simulation results, the output user interface provides a myriad of post-processors that transform the voluminous logged data files to tabular summaries and graphical plots. The types of simulation results available are listed in Table 1.

Bus, Network Interface Processor (NIP) and CPU are processed by separate tools to produce similar utilization statistics and plots. The minimum, average and maximum utilizations for each Bus, NIP or CPU are calculated and the average utilization is plotted along simulation time. The individual message statistics like frequencies and

Table 1 Post Processor Capabilities

BUS & NIP Bus and NIP Utilization Bus and NIP Util Message Traffic vs Time Plot

CPU CPU Utilization CPU Util vs Time Load Balance Evaluation Plot

Task Models % of CPU Utilization Individual Task Stats

Operating % of CPU Utilization System Individual Service Stats Queue Size, Thruput &

Delay Times Performance Port-to-port Times and Distribution of

Distributions PTP Times Plot

M. Tsuchiva, D.C. Shannon /Simulation 17

message lengths are recorded for Bus and NIP. CPU utilization for multiple devices can be evaluated for load balancing.

CPU utilization and task model execution time statistics are calculated for each application software model on each CPU. The percent of CPU utilization and the Operating System Service statistics are calculated for each OS service on each CPU.

Data is recorded for every message using any queue on any device. The queue post-processor summarizes the minimum, average and maximum queue sizes, throughput and delay times for messages in queues.

The time delays from start to finish of a processing thread are performance measures. These port-to-port (PTP) times vary in length as a function of resource loading and contention for resources. The performance post-processor calcu- lates minimum, average, mode, and standard devi- ation of PTP times. The plotted distribution of PTP times often highlights unusual and important characteristics left hidden by the statistics.

DDPSIM software has a modular structure which enhances correctness, adaptability to new applications and future growth. Its software organization consists of four modules that model the application environments as illustrated in Fig- ure 1: the user interface, the simulation executive, the hardware/operating system monitor, and the application software monitor.

The User Interface consists of a user input preprocessor and a simulation output post- processor. The input preprocessor facilitates a simple user input procedure for system definition, performance parameters, and scenario data. The post-processor performs data reduction, constructs concise tabular reports and generates plots of the simulation results for enhanced visual analysis.

The Simulation Executive constitutes the kernel of the simulator and performs two functions that are essential to simulation: 1) scheduling events which includes locking and unlocking of resources for concurrency control and 2) management of the user-supplied system definition database. The Simulation Executive references the database during simulation for checking validity and calculat- ing execution delays for the simulated event. This ensures that the simulation is performed within the constraints imposed by the system definition thereby improving the robustness of the simulator.

The Hardware/Operating System Monitor models hardware resource utilization, maintains the various queues associated with each hardware device and collects statistics on processors, buses, I / O queues, data queues and message traffic. In addition, it may also include an external interface for application specific equipment models such as radars and sensors. The definition of hardware configuration is relatively simple because of the bus-connected structure that the simulator is de- signed primarily to simulate. The hardware config-

~ _ ~ USER INTERFACE "~

- SYSTEM DEFINITION m

- SIMULATION CONFIGURATION

- POST-PROCESSING CONTROL m

SOFTWARE LIBRARY

HARDWARE LIBRARY

SIMU LATION EXECUTIVE

DATA -- LOG

m

COMMAND FILE ~"

(PORT-TO-PORTs)

OPERATING SYSTEM SERVICES

QUEUE STATISTICS

COMPUTER

SOFTWARE

POST'PROCESSING TOOLS

BUS AND NETWORK INTERFACE PRO- CESSOR

PE R FORMANCE MEASURES

,STATISTICS

PLOTS

Fig. 1. Structure of the Distributed Data Processing Simulator (DDPSIM).

18 M. Tsuchiya, D.C. Shannon /Simulation

uration and processing speed are defined by the user and stored in the system definition database which is managed by the Simulator Executive. The information about hardware performance is used to determine message transmission and processing delay, queue priorities and message routing.

The Application Software Monitor simulates task execution within each individual processor, generates messages as specified by the communication message definition and transmits them over the bus to appropriate destinations. As it simulates execution, the Application Software Monitor collects such statistics as task execution time, operating system overhead and I /O services statistics.

A simulation model for the target distributed network is defined by the user supplied system specification, code size and timing data, and task

control flow model which is an abstraction of program execution. The level of detail of the model definition depends on the fidelity of the simulation to be performed. The detail of model representation is described with an example in the following section.

The execution speed of the simulator is dictated by the fidelity of the model and the degree of parallel processing being simulated. To simulate a typical task execution or message processing takes approximately 30 milliseconds on the VAX 11/780. For a complex simulation where the real event is faster than its simulated counterpart or where many events are simulated in parallel, the simulation requires longer than real time. Like- wise, the simulator will model slow events faster than real time.

- - - - - ~ NRTH_CPU 1 (~NRTH-CPU I

o RADR BUS WEST_BUS

INPUT/OUTPUT PROCESSOR (lOP) NETWORK INTERFACE PROCESSOR (NIP) - O

( i )-<i)-

ccc._Bus

-- STCP

lcP0 L Fig. 2. Air Traffic Control System Architecture.

M. Tsuchiva D.C. Shannon / Simulatioh 19

3. Simulation Procedure: An Example

The simulation procedure is described using a real-time radar signal processing system as an example. It is a distributed processing system for air traffic control that consists of five radars, eight processors, and twenty NIPs that are connected by three buses as illustrated in Figure 2. Figures 3 through 5 depict the three process threads originating from periodic radar returns. Tasks E TASK1, W_TASK1 and N_TASK1 which serve to model the east, west and north radar functions originating the radar returns are periodically enabled by their respective clocks.

In Figure 3, the radar return (E_RETRN) is processed by E_TASK2 (residing on the EAST_CPU) to command either the east radar (E_ RQST with 90% probability) or the west radar (W_RQST with 10% probability) and to log its activities (E_DATA) to the Command, Control and Communication task C_TASK1. Enabled by the radar request E_RQST, task E_TASK3 (also residing on the EAST_CPU) commands the east radar (E_TASK4) via the message E_CMMD. The west and north processing threads (Figures 4 & 5) function similarly. Figure 6 models the lower priority status and health check between the radars IE_TASK1, etc.) and the Command, Control and ~ommunication task C_TASK1 residing on the ~CC processor (CCC_CPU).

3.1. Hardware Model

W_CLOCK ( W_TASKI )

W_RETRN I

W_RQST ~PROB =

) w_CM~ ,i ""- ( W_TASK4 )".. W_DATA

"o ( C_TASK1 )

Fig. 4. West Process Thread.

N_CLOCK

(N_TASK2)

. . . . . . . . . . . ° ' ' ' ' ° ' ' ° ' ' O

DATA FLOW (No E.]~JBLJ£~NT)

) CONTROL FLOW (EN~JBLEMENT)

Fig. 5. North Process Thread.

The system hardware definition specified by the user is illustrated in Tables 2, 3 and 4. Table 2

E_CLOCK

~ E_~_~RN

pR W--RQST ~ ' " " " " . . E DATA OB : l ~ r ~ , ~ E_RQST,J,= 90% ""'C)

( E_T~K5 ) ~ C--TASK1 )

I E C~M) ( E TASK4 )

Fig. 3. East Process Thread.

C_CLOCK

C STATUS ........... i ............ C STATUS - 0 .......... ~) CSTATU~'''T" .... ©

Fig. 6. CCC Process Thread.

defines the performance attributes for the CPU, bus and NIP models. The five radar models are approximated using CPU models already in the hardware library. The radar models can be re- placed later with specific radar models generated by the user. Table 3 details the connectivity of the three interconnected networks connected by the

20

Table 2 Hardware Attributes/Network Definitions

M. Tsuchiya, D.C. Shannon / Simulation

DDPSIM Resource Definitions (Max 50 Resources, 25 Units)

Name Model Units Fix Delay Speed Comments

WESTRADR CPU 1 0.0 1.0 E6 CPU Used for Radar Driver N R T H R A D R CPU 3 0.0 1.0 E6 CPU Used for Radar Driver EASTRADR CPU 1 0.0 1.0 E6 CPU Used for Radar Driver W_R_NIP NIP 1 40.0 E-6 3.0 E6 Words /Sec N_R_NIP NIP 3 40.0 E-6 3.0 E6 Words /Sec E_R_NIP NIP 1 40.0 E-6 3.0 E6 Words /Sec RADR_BUS BUS 1 1.0 E-6 50.0 E3 Words/Sec N R T H _ C P U CPU 5 100.0 Eo6 0.5 E6 MLI /Sec N_C_NIP NIP 5 40.0 E-6 3.0 E6 Words/Sec WEST_BUS BUS 1 1.0 E-6 50.0 E3 Words /Sec W_NIP_I NIP 1 40.0 E-6 3.0 E6 Words /Sec W_NIP_2 NIP 1 40.0 E-6 3.0 E6 Words /Sec C_NIP_I NIP 1 40.0 E-6 3.0 E6 Words/Sec EAST_CPU CPU 1 100.0 E-6 0.5 E6 MLI /Sec E__C_NIP1 NIP 1 40.0 E-6 3.0 E6 Words /Sec E._C_NIP2 NIP 1 40.0 E-6 3.0 E6 Words/Sec WEST_CPU CPU 2 100.0 E-6 0.5 E6 MLI /Sec W_C_NIP1 NIP 2 40.0 E-6 3.0 E6 Words/Sec W_C_NIP2 NIP 2 40.0 E-6 3.0 E6 Words/Sec CCC_BUS BUS 1 1.0 E-6 50.0 E3 Words/Sec CCC_NIP NIP 1 40.0 E-6 3.0 E6 Words/Sec CCC_CPU CPU 1 100.0 E-6 0.5 E6 MLI /Sec

Table 3 Hardware Connectivity, bus

DDPSIM Networks: Unique Resource Links Nodes (E.g. Bus Links NIPs)

RADR_BUS Links W_R_NIP N_R_NIP E_R_NIP N_C_NIP W_NIP_I C_NIP_I E_C_NIP1 WEST_BUS Links W_NIP_2 W_C_NIP1 CCC_BUS Links W_C_NIP2 C_NIP_2 E_C_NIP2 CCC_NIP

three buses. Table 4 defines the NIP to processor within a node and NIP to NIP gateway connec- tions. Together Tables 3 and 4 specify a total set of legal message paths which is represented by the

Table 4 Hardware Connectivity, NIP

DDPSIM Mapping: Each Resource's Unit Maps to Comparable Node(s)

W E S T R A D R Maps W_R_NIP N R T H R A D R Maps N_R_NIP EASTRADR Maps E_R_NIP N R T H _ C P U Maps N_C_NIP EAST_CPU Maps E_C_NIP1 WEST_CPU Maps W_C_NIP1 CCC_CPU Maps CCC_NIP W_NIP_I Maps W_NIP_2 C_NIP_I Maps C_NIP_2

E_C_NIP2 W_C_NIP2

connectivity matrix that is constructed by the simulator's input user interface. The connectivity matrix is referenced during the simulation to vali- date message routing.

3.2. Software Model

The software models are defined in Tables 5, 6 and 7 by a set of attributes which define each task's periodicity, allocation, execution sizes and message generation for the library task logic. No special user task models were generated; all models used the library model:

A . P e r i o d i c E x e c u t i o n - For the tasks that are executed at certain predetermined periods, the execution period (PERIOD in Table 5) is specified in seconds. For periodic execution, the clock enters a task dispatch message into the processor dispatch

M. Tsuchiya, D. C. Shannon / Simulation

Table 5 Process Periodicity and Allocations/Process Definitions.

21

DDPSIM Task Allocations (Period in Seconds, Maximum 100 Tasks)

Task Name Model Period Message Resname Units (max. 25 Units)

W_TASK1 0.010 W_CLOCK WESTRADR 1, W_TASK2 WEST_CPU 2, W_TASK3 WEST_CPU 1, W_TASK4 WESTRADR 1, E_TASKI 20.0 E-3 EASTRADR 1, E_TASK2 EAST_CPU 1, E_TASK3 EAST_CPU 1, E_TASK4 EASTRADR 1, N_TASK1 0.010 N R T H R A D R 1, 2, 3, N_TASK2 NRTH-_CPU 2, 3, 4, 5, C_TASK1 0.50 CCC_CPU 1,

E_CLOCK

N_CLOCK

C_CLOCK

Table 6 Process Execution Sizes

DDPSIM Task Executing Coefficients: Source Instructions (50 Values)

Task Name Speed Expand OSOH DSPTCH 1 2 3 4 MLI 11 12 13 14

W_TASK1 1.00 E6 W_TASK2 1.97 E6 3.2 0.05 100 3500 W_TASK3 1.97 E6 3.2 0.05 100 2757 W_TASK4 1.00 E6 E_TASK1 1.00 E6 E_TASK2 2.2 E6 1.9 0.05 100 3540 E._TASK3 1.8 E6 1.9 0.05 100 3440 E_TASK4 1.00 E6 N_TASK1 1.00 E6 N_TASK2 2.5 E6 4.3 0.07 250 2823 C_TASK1 2.50 E6 4.5 0.10 250 650

queue and reschedules the next clock interrupt. A periodic task may also be data enabled.

B. T a s k A l l o c a t i o n - Allocation of tasks to the processors is explicitly specified by the user as

Table 7 Library Task Logic

shown in Table 5. A task may be allocated uniquely to a processor or it may be replicated to any number of processors. Due to the simplicity of specification, the tasks may be reallocated between

Static Task Logic: Que Read & Message Generation. Uniform Distribution: + Prob: Success; - Prob: Failure

TASK READ Q PROB MSGName PROB MSGName PROB MSGName

W_TASK1 DATA _QUE 1.00 W_RETRN W_TASK2 0.5 W_RQST W_TASK3 1.00 W _ C M M D 2.0 E_TASK1 1.00 E_RETRN E_TASK2 0.90 E_RQST - 0.90 1.50 E_TASK3 1.0 E_CMMD N_TASK1 1.00 N_R E T R N C_TASK1 1.0 C_STATUS

DATA_QUE

DATA_QUE D A T A - Q U E

W_DATA

W_RQST E-DATA


runs to different processors easily without changing the hardware connectivity.

C. Execution Rate - The execution rate (SPEED in Table 6) is specified for each task to supersede the processor's speed, if necessary. It can be used to describe a performance difference based upon different instruction mixes. This is because logical, real/integer arithmetic, or I / O operations execute at different rates. The execution rate can be ob- tained by benchmarking similar code mixes.

D. Expansion Ratio of Source to Object Code -

This, along with the number of source instructions, determines the executable task size in machine instructions. This code count is used for calculat- ing the execution time on a processor with a given instruction execution rate. The expansion ratio (EXPAND in Table 6) is a function of task instruction mix which, ideally, should be estimated using benchmarks.

E. Operating System Overhead - The operating system overhead (OSOH in Table 6) can be accounted as an additive percentage to the application task size. It can also be modeled explicitly as I / O services in the user task models.

F. Task Dispatch Overhead - The task dispatch overhead (DSPTCH in Table 6) is expressed in a number of machine instructions and is assessed for each task initialization. This is logged as an operating system service.

G. Source Instructions - The executable source instruction count in Table 6 can be specified for up to 50 logical branches within a user task model. The library task uses only the first value as the executable size. The accumulative instruction count for all branches executed is multiplied by the expansion factor to obtain the machine language instruction (MLI) count. The MLI count is divided by the processor hardware speed to obtain the net time delay.

H. Task logic - In Table 7 the user may specify options for the library task logic without creating any code. The capabilities include the option to read a data queue, to transmit specific messages with a uniform, random probability and to log explicit operating system services for each message. If listed by name, a data queue will be read during each task execution. The messages that follow on that line will be transmitted using the given uniform probabilities. Probabilities greater than 1.0 will cause multiple messages to be sent.

3.3. Communication Message

The communication scheme of the distributed network is specified by the hardware and operating system queues, control/enablement messages and data messages as shown in Tables 8 and 9.

The transmission delay of a message is modeled in the NIP and bus models. The transmission delay is calculated using the device speeds and message lengths provided in Tables 2 and 9.

A. Queues are used in the distributed network to transmit and receive messages and to queue task enablement requests. All queues must be explicitly specified. Table 8 lists the task dispatch queues with five priority levels, the NIP transmit queues with five priority levels ( O U T Q _ I , etc.), the N IP /Bu s queues (NIP_BUSQ and Bus_NIPQ), an operating system queue TIMETAG and a data queue (DATA_QUE) in the processors.

B. Message Length - In Table 9 the message length is specified in units such as words or bytes that are consistent with the communication devices in Table 2. This example uses words. A message must be at least one unit long and its maximum length is dictated by the communication protocol or device being modeled.

C. Originating Queue - The originating queue is a NIP transmit queue which prioritizes messages onto the bus. It is listed under the heading OUTQUE in Table 9.

Table 8 Queue and Buffer Definitions. Data/Control Flow Definitions. DDPSIM Queue and Buffer Definitions (Max. 50 Queues)

Name Comments

PRIORTY1 PRIORTY2 PRIORTY3 PRIORTY4 PRIORTY5 OUTQ_I OUTQ_2 OUTQ_3 OUTQ_4 OUTQ_5 TIMETAG

NIP_BUSQ BUS_NIPQ BUS_NIPQ DATA_QUE

Dispatch Queue Highest Priority Dispatch Queue Dispatch Queue Dispatch Queue Dispatch Queue Dispatch Queue to Bus Highest Priority Dispatch Queue to Bus Dispatch Queue to Bus Dispatch Queue to Bus Dispatch Queue to Bus Time Tagged Queue for Future

Message Dispatches NIP Output Queue to the Bus Bus Output Queue to the Nip Bus Output Queue to the Nip Replicated Data Base Manager Queue

34. Tsuchiva, D. C, Shannon /Simulation 23

Table 9 Control and Data Messages.

DDPSIM Message Definitions (Max. 100 Messages). Rules: 0/1 = Single Destination/Broadcast; 3 = One of N Nodes Random

Name Length Outque Busname Input Q Rule Destination Tasks

W_CLOCK 1 LOCAL PRIORTY2 W_TASKI W_RETRN 10 OUTQ_I WEST_BUS PRIORTY2 W_TASK2 W_RQST 60 OUTQ_I CCC_BUS PRIORTY2 W_TASK3 W_CMMD 30 OUTQ_I WEST_BUS PR1ORTY2 W_TASK4 W_DATA 100 OUTQ_2 CCC_BUS DATA_QUE C_TASK1 E._CLOCK 1 LOCAL PRIORTY2 E._TASK1 E_RETRN 50 OUTQ_I RADR._BUS PRIORTY2 E_TASK2 E__RQST 60 LOCAL PRIORTY1 E_TASK3 E_CMMD 40 OUTQ_I RADR_BUS PRIORTY2 E_.TASK4 E_DATA 100 OUTQ_2 CCC_BUS DATA_QUE C_TASK1 N_CLOCK 1 LOCAL PRIORTY2 N_TASKI N_RETRN 50 OUTQ_I RADR_BUS PRIORTY2 3 N_TASK2 C_CLOCK 1 LOCAL PR1ORTY2 C_TASK1 C_STATUS 5 OUTQ_I C_NIF_2 DATA_QUE W_TASK1 E_ TASK1 N_TASK1

D. Des t ina t ion Queue - This identifies the destination input queue (e.g., task dispatch or data queue, INPUT Q in Table 9). Since each task dispatch queue is assigned a priority level, the input queue specifies a task enablement priority via this message. This capability allows each task to have multiple enablement priorities using different enablement messages. Data messages define the receiving data queue but do not enable a task. For example, the message W D A T A is a data message directed to the data queue in the CCC CPU; it can be read by a task in that node but cannot enable any task.

E. P r i m a r y P a t h - Ambiguous pathways can exist in multiple bus configurations. Since the simulator does not arbitrarily select the intended path, the user must name an intermediate device (BUSNAME in Table 9) which identifies a unique path. Also the user may specify an intermediate device of "LOCAL" for messages which do not leave the originating node.

In this example, there are two possible paths between the two WEST_CPUs; this ambiguity must be clarified for any communication between the two processors. The message W _ RQST (Table 9) is t r ansmi t t ed f rom W _ T A S K 2 (in W E S T _ C P U No. 2) to W _ T A S K 3 (in W E S T _ C P U No. 1) via the C C C _ B U S . CCC_BUS identifies the unique pathway. Alter- natively, WEST_BUS could be specified instead of CCC_BUS; this would affect the bus and possibly the system performance.

F. M e s s a g e R o u t i n g Ru le s - This parameter defines the bus routing possibilities which include single destination, multicast and broadcast. One selective broadcast capability (= 3 in Table 9) is supplied: a message will be routed randomly to a single destination node selected from all the nodes containing the destination task. This may be used to randomly distribute the work across the possible processors. For example, the radar returns (N _ RTRN) from the N R T H R A D R are randomly allocated to one of the N RTH _ CP U s . Other unique rules can be created by the user with modification to the bus model.

G. Des t ina t ion T a s k s - The message is routed to a single node or to all nodes containing the destination task(s). Between simulation runs, the task can be reallocated to different hardware without changing the message definition, and the message routing will be recomputed correctly. The message N _ R E T R N is routed to each N RTH CPU node in which task N TASK2 is located. Rule 3 is requested which directs the bus to select the single destination node randomly.

3.4. S i m u l a t i o n Resu l t s

In this section the simulation results selected from the host of tabular summaries and plots are presented as examples. Notice that the data is reprocessed in some cases with different time in- tervals associated with each plot point or tabular period. This is done to highlight the more interest-


ing details of the example. The NIP and most of the queue statistics were left out for brevity.

A . C P U U t i l i z a t i o n - The statistics and plots for the WEST_CPU are featured in Table 10 and Figure 7, respectively. In Table 10, TOTAL EX- ECS in the task statistics table is the total number of times a task is executed during the measured simulated period. EXEC RATE is the number of times a task is executed in one second. Util is the task utilization during this measured period: that is, Util = (average execution time) × (execution rate). In this example, the loading in the second WEST_CPU unit is fairly even reflecting the processing of the periodic radar returns. Unit 1, however, is very irregular reflecting the random decision to request subsequent track commands. Consequently, the Load Balancing Evaluation in Table 10 indicates a very poor balance between Units 1 and 2. The HI /LO ratio indicates the ratio of the highest utilized CPU to the lowest one. The smaller the ratio (one is the perfect balance), the better the load balance between the processors. Neither WEST processor appeared overloaded, utilizing below 70% in a 50 millisecond period.

B. B U S U t i l i z a t i o n - The usage statistics for the three buses are featured in Tables 11, 12 and 13, and the corresponding plot for the CCC_BUS (Table 13) is shown in Figure 8. They show that except for WEST-BUS, bus utilization is relatively high and that WEST_BUS is a good alternative

Table 11 RADR_BUS Statistics

Length of Simulation: 5.00024 Seconds Longest Busy Sequence: 5.186558 Msecs Average Utilization: 46.52% Maximum Utilization for 0.050 Second Interval: 49.46%

Message Count Msg Average Name of Rate Message

Occurrences Length

W_RETRN 500 100.00 12.00 W_CMMD 308 61.60 32.00 E_RETRN 250 50.00 52.00 E_CMMD 220 44.00 42.00 N_RETRN 1500 299.99 52.00 C_STATUS 10 2.00 7.00

Total 2788 557.57 41.67

Table 12 WEST_BUS Statistics.

Length of Simulation: 5.00024 Seconds Longest Busy Sequence: 0.881672 Msecs Average Utilization: 6.36% Maximum Utilization for 0.050 Second Interval: 10.10%


Occurrences Length

W_RETRN 500 100.00 12.00 W_CMMD 308 61.60 32.00

Total 808 161.59 19.62

Table 10 WEST_CPU Utilization Statistics

Task Statistics

Task Name Execution Time

Min Avg Max (msec) (msec) (msec)

Total Exec Util. Execs Rate (%)

W_TASK2 6.180 6.378 6.537 500 100.14 63.9 W_TASK3 6.048 6.048 6.048 308 61.69 37.3

CPU Statistics [Utilization for 0.050 Secs] Longest Task Executions

CPU No Min % Avg % Max % Busy (Msec) Total Rate

1 0.0 37.3 70.2 24.191 308 61.69 2 61.8 63.9 65.4 6.5387 500 100.14

Load Balancing Processors 1 through 2 evaluated 99 occurrences of CPU utilization greater than 30%

HI Util LO Util HI/LO Time of Occurrence

Minimum Ratio 64.5 63.7 1.01 4.9000 Maximum Ratio 62.5 2.4 25.93 0,3000 Average Ratio 2.39

M. Tsuchiva, D.C. Shannon / Simulation 25

d

Q

ZUO- 0

¢*1-

UNIT 2

f. ot~ I'. O0 2'. O0 3.00 .0~1 EN~AGEMENI TIME

UNIT 1

~.00

O, 050 SEC

6'.I~ ¢.lli 8'.ae

Fig. 7. West-CPU Utilization Plots.

m lID

o

( 5

~J

a

t.~" a-

t~ .

n- G

O-

re

d IN"

Q

J

, lib I'.'li $ 2 w. lie 3'. lib |'. liI ~w. lib fir. lib ?~. lib 8'. lil~ TIME

Fig. 8. CCC-BUS Utilization Plot.


Table 13 CCC_Bus Statistics

Length of Simulation: 5.00024 Seconds Longest Busy Sequence: 14.727354 Msecs Average utilization: 47.92% Maximum Utilization for 0.050 Second Interval: 79.32%


Occurrences Length

W_RQST 308 61.60 62.00 W_DATA 616 123.19 102.00 E_DATA 370 74.00 102.00 C_Status 10 2.00 7.00

Total 1304 260.79 91.82

resource for the sporadic overload condition (i.e., maximum 79%) on the CCC_BUS caused by the three large and frequent messages (W_RQST, E_DATA and W_DATA in Table 13).

C. Queue S ta t i s t i c s - Data was recorded for over 80 queues or buffers in the 36 hardware

devices. A-sample of the tabular summary is found in Table i4. It details the throughput, wait times and sizes of the queues in the NIP (E_C_NIP2) between the EAST_CPU and the CCC_BUS. It recorded significant but tolerable maximum waits onto the busy CCC_ BUS.

4. Summary

Through the experience in simulation and simulator development of distributed processing networks, a number of important lessons have been learned. Among them, in particular, a hierarchical simulation capability that permits various levels of simulation fidelity is extremely important. It en- ables a system engineer to use the simulator effec- tively at different phases of system design: from the top level design in an early phase, to the more detailed design at a later phase. It also allows the demonstration of consistency between the higher level and the detailed level design. If the con-

Table 14

EAST_CPU to CCC_BUS NIP Statistics Queue Statistics (Throughput). Resource: E C NIP2 Unit: 1 (*Averages and Maximums over 0.050 sec Interval)

Queue Throughput Messages Throughput Words Name Avg* Max* Occurred Avg* Max

(Count) (Count) (Interval (Count) (Count) End Time)

Occurred (Interval End Time)

OUTQ_I 0.3 2.0 3.35 18.7 124.0 3.35 OUTQ_2 3.7 6.0 0.05 378.5 612.0 0.05 NIP_BUSQ 4.0 6.0 0.05 397.2 612.0 0.05

Queue Statistics (Wait Times). Resource: E C NIP2 Unit: 1 (Per Message Wait Times)

Queue Avg Max* Occurred Name (Msec) (Msec)

OUTQ_I 0.001 0.001 0.67 OUTQ_2 0.030 0.075 0.01 NIP_BUSQ 3.218 7.623 4.92

Queue Statistics (Sizes). Resource: E_C_NIP2 UNIT: 1

Queue Queue Size (Messages) Queue Size (Words) Name Time Max Occurred Time Max

Weighted Weighted Avg Avg

Occurred

OUTQ_I 0.000 1.0 0.67 0.000 62.0 0.67 OUTQ_2 0.002 2.0 0.01 0.226 204.0 0.01 NIP_BUSQ 0.258 2.0 0.01 25.933 204.0 0.01

M. Tsuch(va, D.C.

sistency is violated, the simulator could isolate the causes.

Robustness of simulation is attained by consistency checks between hierarchical levels and among system definitions; they ensure the validity of the simulation results. Message routing path, for example, is checked against the processor connectivity for possible human mistakes.

For the simulator to be an effective system engineering tool, a provision for a comprehensive user interface is extremely critical for user accep- tance. Simple tabular input formats, and tabular and graphic outputs are essential. Selective output and post-processing capabilities are important for avoiding oversupply of information. Currently, further improvement is envisioned for the user interface to allow graphic inputs of the distributed network definition.

Finally, increased fidelity in some models may not necessarily mean refinement in simulation and higher accuracy of results. Increased fidelity based on more detailed assumptions could simply in- crease the execution time of the simulator and contribute little to the improvement of the simulation results. If new, reliable performance data are added, the fidelity should be substantially increased and the confidence level of the simulation results raised. It would be valuable if the confidence level of the simulation results can be calculated as can be done for analytic models.

The distributed processing system simulator described in this paper is under further evolution. In particular as its user interface is made more friendly, it may be used as a design and evaluation tool. As it is applied to a wide variety of applications, it is expected to improve its credibility and

Shannon / Simulation 27

adaptability, and will prove to be an effective systems engineering tool.

References

[1] D.F. Palmer and M.P. Mariani, TutoriaL" Distributed Sys- tem Design, IEEE Computer Society, Piscataway, N J, 1979.

[2] J. Labetoulle, E.G. Manning and R.W. Peebles, "A Homogeneous Network: Analysis and Simulation", Com- puter Networks, Vol. 1, No. 4, May 1977, pp. 225-240.

[3] P. Spaniol, "Modeling of Local Area Network", Computer Networks, Vol. 3, No. 5, November 1979, pp. 315-326.

14] J.W. Yeh, "Simulation of Local Computer Networks - A Case Study", Computer Networks, Vol 3, No. 6, December 1979, pp. 401-417.

[5] K.M. Chandy, V. Holms and J. Misra, "Distributed Simu- lation of Networks", Computer Networks, Vol. 3, No. 2, April 1979, pp. 105-113.

[6] K.M. Chandy and J. Misra, "Distributed Simulation: A Case Study in Design and Verification of Distributed Programs", IEEE Trans. on Software Engineering, Vol. SE-5, No. 5, September 1979, pp. 440-452.

[7] R.D. Acker and P.H. Seaman, "Modeling Distributed Processing Across Multiple CICS/VS Sites", IBM Sys- tems Journal, Vol. 21, No. 4, 1982, pp. 369-373.

[8] H-M. Huang, "Performance Simulation of Distributed Systems," Proc. Phoenix Conf. on Computers and Com- munications, Phoenix, AZ, May 1982, pp 369-373.

[ 9] J.E. Melde and R.R. Dye, "Hierarchical Performance Modeling for Complex Data Processing Systems", Proc. USAF Academy Computer-Related Information System Symposium, Colorado Springs, CO, February_ 1983, pp. 21-1-2l-3.

[101 W.W. Chu, L.J. Holloway, M. Lan and K. Efe, "Task Allocation in Distributed Data Processing". Computer, Vol 13, No. 11, November 1980, pp. 57-69.

[11] P.Y.M. Ma, E.Y.S. Lee and M. Tsuchiya, "A Task Alloc- ation Model for Distributed Computing Systems", IEEE Trans. on Computers, Vol C-31, No. 1, January 1982, pp. 41-47.

simulation of distributed processing networks

Documents