performance optimization of a monte carlo simulation code for estimating the all-terminal...

7/30/2019 Performance Optimization of a Monte Carlo Simulation Code for Estimating the All-Terminal Reliability of a Network

1/7

Performance Optimization of a Monte Carlo

Simulation Code for Estimating the All-

Terminal Reliability of a NetworkBeatriz Otero Calvio, Silvia Pascual Martnez, and Claudio M. Rocco Sanseverino

Abstract All-terminal reliability (ATR), defined as the probability that every node in a network can communicate with every

other node, is an important problem in research areas such as mobile ad-hoc wireless networks, grid computing systems, and

telecommunications. The assessment of ATR has also been part of related problems like the reliability allocation problem.

However, the exact calculation of ATR is a NP-hard problem. To obtain this probability, there are approaches based on analytic

methods for small networks or estimation through Monte Carlo simulation (MCS). In this paper, first a Fortran code that

estimates the ATR is improved using optimization software techniques. Secondly a parallel implementation, based on the

Message Passing Interface (MPI) standard is presented. The implementation can take advantage of the existence of multiple

processors thus reducing the time required for the ATR assessment. One example related to a real network illustrates the

benefits. The parallel implementation can reduce by up to 74% the execution time of a serial Monte Carlo simulation.

Index Terms All-terminal reliability, Monte Carlo Simulation, MPI.

1 INTRODUCTION

ll-terminal reliability (ATR), defined as the proba-bility that every node in a network can communi-

cate with every other node, is an important problem inresearch areas such as mobile ad-hoc wireless net-

works, grid computing systems, and telecommunica-tions [1].

A

An important area of application of ATR is relatedto the optimal design of structures, for example the de-sign of computers or communication networks, wherecost and reliability are important objectives. There aremany formulations of this problem, but in general theidea is to define the layout of the network at minimumcost while meeting a minimum ATR requirement [1].

A common aspect of such methods is that the net-work reliability must be assessed for each of the possi-

ble solutions or candidate topologies. The search space

is k(|N|x|N|-1)/2

, where k is the number of choices for thelinks and |N| is the number of nodes in the network[1]. For example, for k=2, a network with |N|=20 has1.57x1057 possible designs. Of course, heuristic pro-cedures, like genetic algorithm [2], [3], evolutionarystrategies [4], artificial neural network [1] among oth-ers, have been studied so that the search space is notfully explored. However, no matter the approach se-lected, the ATR of each topology generated must be

evaluated using a very fast procedure.The exact calculation of ATR is an NP-hard

problem, with computational effort growing expo-nentially with the number of nodes and link in the net-

work [1]. Several analytical procedures have been sug-gested to assess the ATR, like the enumeration of allpossible minimal cut sets, the approaches proposed in[5], [6] or a rough estimation approach (using a rela-tively fast procedure (in general with a computationaleffort of O(N3)) able to produce upper and lower

bounds [1], [7]. Monte Carlo Simulation (MCS) can as-sess ATR very precisely but requires a high computa-tional effort to obtain a good estimate [8].

An original code (OC) (developed in Fortran) [9] isused to perform the MCS. A simplified flowchart forthe MCS is presented in Figure 1. Given the topology of

a network along with the link failure probabilities: 0) Asuccess counter is set to zero; 1) A random sampletopology is generated, by selecting those links in theoriginal network that are failed or operational; 2) theconnectivity of the generated network is assessed usinga minimum spanning procedure (Prim algorithm) [10].If all nodes are connected, the success counter is incre-mented; 3) Steps 1-2 are performed NSIM times. At theend, the ATR is estimated as the ratio of the successcounter and the number of topologies evaluated. Notethat MCS is based on the evaluation of NSIM indepen-dent network samples. Statistical theory indicates that

the error for the ATR estimation is proportional toNSIM-1/2.The main objective of this paper is to improve the

execution time of the MCS program. To this aim, twoactions are developed: a) Optimization of the original

B. Otero Calvio is with the Department dArquitectura de Computa-

dors, Universitat Politcnica de Catalunya, Barcelona TECH (UPC),

08034, Spain. S. Pascual Martnez is with the Universitat Politcnica de Catalunya,

Barcelona TECH (UPC), 08034, Spain.

C. M. Rocco Sanseverino is with the Engineering Faculty, Universi-

dad Central de Venezuela, Caracas, Venezuela.

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 11, NOVEMBER 2012, ISSN (Online) 2151-9617

https://sites.google.com/site/journalofcomputing

WWW.JOURNALOFCOMPUTING.ORG 1


2/7

code; b) Parallel implementation of the MCS.For optimizing the performance of the original code,

the sections that spend more CPU time are identified,using profiles tools, and optimization actions are pro-

posed. As a result, an optimized MCS serial program

(OMSP) is built.Additionally, a parallel implementation of OMSP,

based on Message Passing Interface is proposed. In this

case, the behavior of the parallel implementation is as-sessed, using others profile tools.

Fig. 1. Basic MCS flowchart

The paper is organized as follow: Section 2 de-

scribes the computational example tested and the soft-ware tools used. Section 3 shows the optimizations per-

formed and reports the execution time for OMSP. Sec-tion 4 presents a new parallel implementation of the

code using MPI and shows the execution times ob-tained to estimate the ATR in one real network topolo-

gy. Section 5 studies the behavior of the ATR estima-tion for both OMSP and parallel implementation. Fi-

nally, the last section presents the conclusions and fu-ture work.

2 CASESTUDYANDSOFTWARETOOLS

In this section the network topology to be tested andthe software tools used to evaluate the performance ofthe program versions developed, are described. In allthe evaluations, an Intel Core i7-720QM with 8processors machine was used [18].

2.1 Network

The network used during all the evaluations to be pre-sented, corresponds to Austin transportation systemfor the morning peak-hour period. It is important tomention that only the network topology is used and nophysical characteristic of the transportation system ismodeled. This network is formed by 7388 nodes and18961 links [11]. It is assumed that all link failureprobabilities are defined as q=0.05. Figure 2 shows

the corresponding network topology.

2.2 Software tools

The following subsections describe the software tools

used to evaluate the performance of the new programversions developed.

Fig. 2. Austin network topology

2.2.1 Compilers

Two compilers were used to generate the object code ofthe code under evaluation: gfortran [12] and ifort [13].The first compiler generates code for different comput-er architectures and operating systems while the sec-ond one is targeted for Intel architectures. The ifortcompiler includes implicit improvements on Intel ar-chitecture-based computers.

2.2.2 Parallel programming model

The new parallel version of the OC is implemented us-ing the message passing programming model. In thismodel, the parallel processes exchange data throughpassing messages between them. The standard usedfor data interchange is the Message-Passing Interface(MPI).

MPI is a message-passing library interface specifica-tion to be used on parallel computers, with operationsexpressed as functions, subroutines, or methods, ac-cording to the appropriate language bindings [16].

MPI can be found in multiple implementations. Theimplementation selected is an open source MPI calledOpenMPI that includes several options for specifying

how to assign the MPI processes using the availableprocessors [17]. When a MPI program is launched, it ispossible to specify how many processes are going totake part in the MPI execution. In order to achieve afaster execution as many processes as available proces-sors need to be involved.

2.2.3 Profile tools

To improve the performance of the OC is necessary toidentify the code sections that consume more CPUtime. To localize these sections a profiling tool called

gprof [14] is used.

Gprofis a call graph execution profiler that creates acall graph detailing which functions in a serial pro-gram are called, and records the amount of time spentin each function. This type of profiling is useful for fig-





3/7

uring out which functions must be optimized, eitherbecause they are called very often or because they takea significant execution time.

To study the behavior of the parallel program, theperformance tools Extrae and Paraver are selected.These tools,developed by the Barcelona Supercomput-ing Center (BSC) [15], are able to assess the inter-nodecommunication traffic.

3 OPTIMIZATIONS

This section describes the optimizations performed inthe OC. Two types of optimizations are possible: thosethat are independent of the machine and those that aredependent. As a result, the performance of the opti-mized serial program is presented.

3.1 Machine independent optimizations

This kind of optimization does not depend on any fea-tures or characteristics of the target machine. Examples

of code improvements included in this group are: con-stant folding, dead code elimination, and constant

propagation. Because of their machine-independence,these code improvements are often applied to the high-

level intermediate language representation of the pro-gram.

A profiling evaluation was carried out on the origi-nal code using gprof. Table 1 shows which functions orsubroutines consume more CPU time when the ATR ofthe Austin network is estimated (using NSIM=10000).The Prim subroutine, which evaluates the network con-nectivity, is the most CPU-time consuming activity.

TABLE 1GPROFREPORTOFTHEORIGINALCODE

Each sample counts as 0.01 seconds.

%

time

cumulative

seconds

self

seconds

total

callss/call s/call name

79.06 682.82 682.82 20000 0.03 0.04 Prim_

10.33 772.05 89.23 20000 0.00 0.00 Quick_

2.78 796.07 24.02 758440000 0.00 0.00 Remove_

2.47 817.39 21.32 379220000 0.00 0.00 Ran_

2.35 837.65 20.26 20000 0.00 0.01 Inter-

face_Prim

2.13 856.01 18.36 20000 0.00 0.00 Zero_

0.95 864.18 8.17 1 8.17 864.36 MAIN_

0.02 864.36 0.18 1 0.18 0.18 Setran_

0.00 864.36 0.00 3 0.00 0.00 Clock_

0.00 864.36 0.00 1 0.00 0.00 Time-

stamp_

By inspecting the code some redundancies were

detected on this function and were removed in order to

optimize the execution of the program. Figure 3 shows

one of the modifications made in the Prim subroutine.

In this figure the lines in blue represents the lines

belonging to the original code. We introduce a new linethat considers only one call to the subroutine remove,

which now includes a new parameter in itsspecification. This change reduces the number of

subroutine calls in the Prim subroutine.

Fig. 3. Code modification at Primsubroutine

Additional changes were introduced in

Interface_Prim subroutine, which is called before a Prim

call is issued, as shown in Figure 4. In both structures

if-else, the lines marked in blue appear in both if andelse cases, so the lines were extracted outside the if-

else structure.

Fig. 4. Code modification at Interface_Primsubroutine

3.2 Machine dependent optimizations

Machine-dependent optimizations require specific in-

formation about the target machine. Sometimes it ispossible to parameterize some of these machine-de-

pendent factors, so that a single piece of compiler code





4/7

can be used to optimize different machines just by al-

tering the machine description parameters.

Compilers typically perform numerous optimiza-

tions that can be selectively turned on or off and con-figured through command line flags. Loop unrolling,

function in-lining, instruction scheduling, and otherloop optimizations are only some of the available con-

figurable optimization options. Usually, the definitionof the right optimization level is sufficient, but some-

times, inspection of the assembly code provides insightthat can be used to perform a better compiler optimiza-

tion tuning.The OC was evaluated using different optimization

flags for bothgfortran and ifort compilers. Both compil-ers include optimization flags that can optimize the ex-ecutable code using speed or size as criteria. The ifortcompiler includes additional flags that can help the

programmer to optimize the code. For the OC, theflags guide parallel were selected. Through theseflags the Intel compiler provides a report with sugges-tions in order to parallelize some code zones includingonly some directives.

At the end, the executable code includes all the ma-chine independent optimizations. The best optimiza-tion flag for the GNU compiler was O3 while the

best choice for the Intel compiler was the fast flag.As expected, since an Intel based machine is used, the

best execution times were obtained when the Intelcompiler was used.

3.3 Improvement evaluation

Figure 5 shows the simulation times obtained for eachcompiler to estimate the ATR of the Austin network(NSIM=10000). The lighter bar corresponds to the sim-ulation time of the original code compiled with theirdefault flags (gfortran compiler does not include anyoptimization level while the ifort compiler includes theoptimization level '-O2'). The darker bar corresponds tothe best performance obtained. For the gfortran compil-er, the generated code has the machine independentmodifications described in Section 3.1. The code gener-ated by ifort compiler also included the changes sug-

gested by the set of flags '-guide-parallel'.

Fig. 5. Simulation times obtained

Figure 5 also shows the percentage of improvementobtained. Whilegfortran reduces the simulation time in65%, ifort only reduces it around 7%. The inclusion ofthe default flag O2 by ifort achieves a program al-most three times faster than the program generated by

gfortran compiler (as previously mentioned, default set-tings do not include any optimization level). This factexplains the higher percentage reduction forgfortran.

Table 2 shows the flags that were evaluated. As

previously mentioned, the best performance for

gfortran was obtained when the optimization flag '-O3'

was used. For ifort, the best flag was '-fast'.

TABLE 2OPTIMIZATIONFLAGSFOREACHCOMPILER

Gfortran compiler Ifortcompiler

-O0 -O0-O/-O1 -O1-O2 -O2/-O-O3 -O3

-Ofast -fast-Os -Ofast

4 PARALLEL IMPLEMENTATION

This section describes the parallel implementation ofMCS code under study as well its performance for esti-mating the ATR of the Austin network.

4.1 Description

As previously described, the general MCS approach

for assessing the ATR of a given network, consists onthe evaluation of the connectivity of NSIM possiblenetwork topologies (or samples), derived from the

random and independent behavior of each of its links(failed or operational). In the parallel implementation,

the NSIM samples are shared evenly among theavailable processors (p). For example, if p=4, thenNSIM/4 samples are generated and evaluated perprocessor. Every MPI process independently executes

its own MCS without interfering with other processexecutions.

The flow chart of the parallel program

implemented (Figure 6) consists on the following steps:

1. The program uses the MPI process identifiersto associate them to a role process. The

program assigns one of the processes (the

process which id is 0) as the master processwhile the others are defined as slave

processes.

2. All the slave processes wait for the data beforeexecuting their own simulation.

3. The master process, after reading the input

data (network topology, link failure





5/7

probabilities, NSIM), sends it to all the slaveprocesses along with the corresponding seed

for the pseudo-random generator.

4. Every MPI processes, master and slaves,

performs simultaneous and independent

NSIM/p samples. As a result each processobtains its own counter of successfully

samples (ioper).5. Every slave process sends its success counter

to the master process.

6. The master process, after obtaining all success

counter (including its own success counter),

estimates the ATR as the ratio between thetotal number of success counter and NSIM.

7. All processes finalize their MPI sessions.

Fig. 6. Flow diagram of parallel implementation

In order to get statistically valid reliabilityestimations, each MPI process must use a different

seed. Indeed, processes with the same seed wouldevaluate the same NSIM/p samples and the estimationof the reliability would be statistically dependent. To

guarantee statistical independence, the MPI programsends a different seed to each process (see step 3).

To achieve a faster version of the program, an

inspection of the code was made for the purpose of

finding sections where OpenMP could be used [19].Four sections of the code were selected to use OpenMP

directives. Several processors distributions were testedsharing out the eight processors available between the

MPI and OpenMP technologies. None of these

configurations ever achieve a simulation time shorter

than the MPI program with eight processes without

OpenMP.

4.2 Performance resultsThe ATR of the Austin network is estimated using gfor-tran and ifort compilers and considering one, two, fourand eight MPI processes (see Figure 7). The best simu-lation times (i.e., the time to complete the execution)were obtained using the -O3 flag forgfortran compilerand the -O3 -ipo flag for ifort. It is important to men-tion that MPI libraries do not allow the use of the ifortoptimization flag -fast, which achieved the best simula-tion time in the serial version of the code.

As expected, the more processors available, thefaster the execution is. Nevertheless the percentages of

improvement obtained were lower than expected. In-deed, a proportional improvement was expected.Figure 7 shows that using 8 MPI processes the sim-

ulation time is 70% lower than the simulation time ofthe serial optimized implementation, using gfortranand 74% using ifort .

Fig. 7. Simulation times comparison, using gfortranand ifort

A study using the performance tools Extrae andParaver was also carried out. These tools can show the

behavior of a parallel program by extracting info froman MPI execution. To understand the information pro-vided by these tools Figure 8 illustrates the simplestcase with two MPI processes.

Figure 8 shows a scheme of the parallel programexecution which includes the following five steps:

1. All the MPI processes obtain their identifica-tion.

2. Process 0 sends all the data to the rest of pro-cesses.3. Every process carries out their NSIM/2 sam-

ples, determining the number of successfullysamples.





6/7

4. Every process sends its result (the number ofsuccessful samples) to process 0, which esti-mates the ATR with all the results.

5. All the processes end their execution.

Fig. 8. Steps with two MPI processes

The final simulation time includes the time thatprocess 0 takes to perform steps 2, 3 and 4.

Figure 9 shows the diagram output of Paraver fortwo samples (NSIM=2) of the Austin network for two

MPI processes. Paraver shows the percentage of timeconsumed by every MPI function. In this case, the com-munication between the two processes (steps 2 and 4)took less time to carry out than the simulations (step3). For example, for NSIM=10000 samples, the total ex-ecution time is 89 secs and the communication time isonly 0.32 secs, that is, 0.35% of the total time. Thisassessment suggests that no additional parallel im-provements are required.

Fig. 9. Paraveroutput: parallel execution with two processes

5 ATR ESTIMATIONS

In this section the procedure used to evaluate the qual-ity of the ATR estimation is presented. In the case ofthe serial program, the optimizations included in thecode did not change the ATR estimation obtained by

the original code, using the same input parameters.However, the ATR estimation for the parallel version isdependent on the number of MPI processors used onthe execution. As previously mentioned, every MPIprocess uses a different seed for its pseudo-randomnumber generator, so the network samples generatedare not necessarily equal to those of the serial program.In order to verify that the number of MPI processorsnot degrade the ATR estimation, a test using 1, 2 4 and8 MPI processors was performed. Both serial and par-allel implementations were evaluated 50 times(NSIM=10000), and minimum-average-maximum sta-tistics for ATR estimations were recorded.

Table 3 presents the ATR estimations obtainedwhen the MPI program was executed to evaluate theAustin network. The analysis of all the cases evaluatedreveals that similar estimates of ATR were obtained us-ing different number of processors. That means the

number of MPI processes involved did not degrade theATR estimation.

However, it is clear that the use of a parallel imple-mentation is able to reduce the execution time. Thismeans that a better ATR estimation could be achieved

by increasing the number of samples NSIM and usingthe parallel implementation. For the network consid-ered, a rough estimate suggests that NSIM could be setto 10000*7.

TABLE 3ATR ESTIMATIONSFORTHE AUSTIN NETWORK

MPIProcesses

Minimum Average Maximum

1 2.99E-02 3.24E-02 3.79E-022 2.94E-02 3.24E-02 3.53E-024 2.97E-02 3.24E-02 3.53E-028 3.10E-02 3.25E-02 3.46E-02

In a previous work [20], several additional networktopologies (with lower dimensions) were evaluated,with similar behaviors.

CONCLUSION

In this paper an improved version of an original Monte

Carlo Simulation Fortran code for estimating the al-

l-terminal reliability of a network topology is present-

ed. Two different actions were proposed to enhance the

performance. First, the original code performance was

increased by: a) code changes such as the elimination

of unnecessary instructions or variables and the reduc-

tion of the number of calls to specific subroutines; and

b) using several optimization flags available in the gfor-

tan and ifort compilers. As a result, the CPU time re-

duction obtained was 65% using gfortran and 7% usingifort.

The second action consisted on a new parallel im-

plementation of the optimized code developed in the





7/7

previous step. This implementation reduced the execu-

tion time up to 74% using the ifort compiler and 70%

using the gfortran compiler. In a previous work [20],

several additional network topologies (with lower di-

mensions) were evaluated, with similar behaviors.

This improvement means that, the number of samplesused to assess the ATR on the parallel implementation

could be increased for a better ATR estimation.

As a future work, it could be interesting to imple-ment other versions of parallel MCS, using additionalprogramming models (e.g., StarSs [21]) that exploitmore efficiently the resources available. Moreover, theevaluation of the original code found that the Prim al-gorithm (used for detecting spanning trees) and its re-lated functions consume considerable processing time.A parallel implementation would be possible, allowingfurther execution time reduction [22].

ACKNOWLEDGMENT

This work has been supported by the Spanish Ministryof Education (TIN2007-60625).

REFERENCES

[1] C. Srivaree-ratana, K. Abdullah, A.E. Smith, "Estimation ofAll-Terminal Network Reliability Using an Artificial Neural

Network," Computers & Operations Research, no. 29 pp. 849-

868, 2002.

[2] D.L. Deeter, A.E. Smith, "Heuristic Optimization of Network

Design Considering All-Terminal Reliability," Proc. of the Re-liability and Maintainability Symposium, pp. 194-49, 1997.

[3] B. Dengiz, F. Altiparmak, A.E. Smith, "Efficient Optimization of All-Terminal Reliable Networks Using an Evolutionary Approach,"

IEEE Transactions on Reliability, no. 46, pp. 18-26, 1997.

[4] J.E. Ramirez-Marquez, C.M. Rocco, "All-terminal Network Reliabili-ty Optimization Via Probabilistic Solution Discovery," Reliability En-

gineering and System Safety, no. 93, pp. 1689-1697, 2008.

[5] K.K. Aggarwal, S. Rai, "Reliability Evaluation in Computer-Com-munication Networks," IEEE Transactions on Reliability, no. 30, pp.

32-5, 1981.

[6] S. Rai, "A Cutset Approach to Reliability Evaluation in Communi-

cation Networks," IEEE Transactions on Reliability, no. 31, pp. 428-31,1982.

[7] R.H. Jan, F. J. Hwang, S.T. Chen. "Topological optimization of acommunication network subject to a reliability constraint," IEEE

Transactions on Reliability, no. 42, pp. 63-70, 1993.

[8] G. S. Fishman. "A Monte Carlo sampling plan for estimating net-work reliability," Operations Research, no. 34, pp. 581-594, 1986.

[9] C. Rocco. "Fortran Codes for Network Reliability Assessment". In-ternal Report DIOC IR2000-001 (In Spanish). Universidad Central de

Venezuela, Facultad de Ingeniera.

[10] E. Martins. Minimal Spanning Tree Algorithms. Fortran codes, avai-lable athttp://www.mat.uc.pt/~eqvm/cientificos/for-

tran/codigos.html

[11] H. Bar-Gera, Transportation Network Test Problems,http://www.bgu.ac.il/~bargera/tntp/

[12] http://gcc.gnu.org/fortran[13] Intel Corporation, IntelFortran Compiler XE 12.1 User and Ref-

erence Guides,

http://software.intel.com/sites/products/docu-

mentation/hpc/composerxe/en-us/2011Update/for-

tran/win/index.htm, 2011.

[14] S.L. Graham, P. Kessler, M. Mckusick, "Gprof: A call graphexecution profiler,"ACM Sigplan Noticies, June 1982.

[15] Barcelona Supercomputing Center, Performance Tools,http://www.bsc.es/computer-sciences/performance-

tools, 2011.

[16] Message Passing Interface Forum, http://www.mpi-fo-rum.org/, 2009.

[17] The Open MPI Development Team, Open MPI: Open Source HighPerformance Computing,http://www.open-mpi.org/, 2012.

[18] http://ark.intel.com/products/43122[19] The OpenMP Architecture Review Board, The OpenMP API

specification for parallel programming,

http://openmp.org/wp/, 2012.

[20]S. Pascual , B. Otero and C. M. Rocco, All-Terminal Reliability Evalu-

ation through a Monte Carlo simulation based on an MPI

implementation. PSAM 11 & ESREL 2012.

[21] https://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstract

[22] E. Gonina and L. V. Kal, Parallel Prim's algorithm on dense graphswith a novel extension., Technical Report. Department of Computer

Science, University of Illinois at Urbana-Champaign. November

2007.

B. Otero Calviois an Assistant Professor at Universitat Politc-

nica de Catalunya - Barcelona TECH (UPC). She received her

MSc. and her first Ph.D. degrees in Computer Science from at

Universidad Central de Venezuela in 1999 and 2006, respec-

tively. After that, she received her second Ph. D. in Computer Ar-

chitecture and Technology from the UPC in 2007. Her research

interests include parallel programming, load balancing, cluster

computing, and autonomic communications. She is member of

HiPEAC Network of Excellence.

S. Pascual Martnez received her Telecommunications Engi-

neering degree from the Universitat Politcnica de Catalunya -

Barcelona TECH (UPC) in 2012. Her research is primarily con-

cerned with the design and implementation of system software

for parallel computing, to improve their performance.

C. M. Rocco Sanseverino received the Electrical Engineering and

MSc. Electrical Engineering (Power System) degrees from Universidad

Central de Venezuela (1980, 1982) and Ph.D. degree from The RobertGordon University, Aberdeen, Scotland, UK (2000). He is a Full Profes-

sor at Universidad Central de Venezuela, currently at Operation Re-

search post-graduate courses. His main areas of research interest are

Statistics, Reliability, Evolutionary Multi-objective Optimization and Ma-

chine Learning techniques. He has published more than 150 refereed

manuscripts related to these areas in technical journals, book chapters,

conference proceedings and industry reports. Member of the Editorial

Boards of Reliability Engineering & System Safety, International Journal

of Performability Engineering and Revista de la Facultad de Ingeniera,

Universidad Central de Venezuela.



http://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.htmlhttp://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.htmlhttp://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.htmlhttp://www.bgu.ac.il/~bargera/tntp/http://gcc.gnu.org/fortranhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://www.bsc.es/computer-sciences/performance-toolshttp://www.bsc.es/computer-sciences/performance-toolshttp://www.mpi-forum.org/http://www.mpi-forum.org/http://www.open-mpi.org/http://www.open-mpi.org/http://www.open-mpi.org/http://ark.intel.com/products/43122http://openmp.org/wp/https://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstracthttps://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstracthttp://gcc.gnu.org/fortranhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://www.bsc.es/computer-sciences/performance-toolshttp://www.bsc.es/computer-sciences/performance-toolshttp://www.mpi-forum.org/http://www.mpi-forum.org/http://www.open-mpi.org/http://ark.intel.com/products/43122http://openmp.org/wp/https://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstracthttps://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstracthttp://www.bgu.ac.il/~bargera/tntp/http://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.htmlhttp://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.html

performance optimization of a monte carlo simulation code for estimating the all-terminal...

Documents