performance optimization of a monte carlo simulation code for estimating the all-terminal...

Upload: journal-of-computing

Post on 04-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Performance Optimization of a Monte Carlo Simulation Code for Estimating the All-Terminal Reliability of a Network

    1/7

    Performance Optimization of a Monte Carlo

    Simulation Code for Estimating the All-

    Terminal Reliability of a NetworkBeatriz Otero Calvio, Silvia Pascual Martnez, and Claudio M. Rocco Sanseverino

    Abstract All-terminal reliability (ATR), defined as the probability that every node in a network can communicate with every

    other node, is an important problem in research areas such as mobile ad-hoc wireless networks, grid computing systems, and

    telecommunications. The assessment of ATR has also been part of related problems like the reliability allocation problem.

    However, the exact calculation of ATR is a NP-hard problem. To obtain this probability, there are approaches based on analytic

    methods for small networks or estimation through Monte Carlo simulation (MCS). In this paper, first a Fortran code that

    estimates the ATR is improved using optimization software techniques. Secondly a parallel implementation, based on the

    Message Passing Interface (MPI) standard is presented. The implementation can take advantage of the existence of multiple

    processors thus reducing the time required for the ATR assessment. One example related to a real network illustrates the

    benefits. The parallel implementation can reduce by up to 74% the execution time of a serial Monte Carlo simulation.

    Index Terms All-terminal reliability, Monte Carlo Simulation, MPI.

    1 INTRODUCTION

    ll-terminal reliability (ATR), defined as the proba-bility that every node in a network can communi-

    cate with every other node, is an important problem inresearch areas such as mobile ad-hoc wireless net-

    works, grid computing systems, and telecommunica-tions [1].

    A

    An important area of application of ATR is relatedto the optimal design of structures, for example the de-sign of computers or communication networks, wherecost and reliability are important objectives. There aremany formulations of this problem, but in general theidea is to define the layout of the network at minimumcost while meeting a minimum ATR requirement [1].

    A common aspect of such methods is that the net-work reliability must be assessed for each of the possi-

    ble solutions or candidate topologies. The search space

    is k(|N|x|N|-1)/2

    , where k is the number of choices for thelinks and |N| is the number of nodes in the network[1]. For example, for k=2, a network with |N|=20 has1.57x1057 possible designs. Of course, heuristic pro-cedures, like genetic algorithm [2], [3], evolutionarystrategies [4], artificial neural network [1] among oth-ers, have been studied so that the search space is notfully explored. However, no matter the approach se-lected, the ATR of each topology generated must be

    evaluated using a very fast procedure.The exact calculation of ATR is an NP-hard

    problem, with computational effort growing expo-nentially with the number of nodes and link in the net-

    work [1]. Several analytical procedures have been sug-gested to assess the ATR, like the enumeration of allpossible minimal cut sets, the approaches proposed in[5], [6] or a rough estimation approach (using a rela-tively fast procedure (in general with a computationaleffort of O(N3)) able to produce upper and lower

    bounds [1], [7]. Monte Carlo Simulation (MCS) can as-sess ATR very precisely but requires a high computa-tional effort to obtain a good estimate [8].

    An original code (OC) (developed in Fortran) [9] isused to perform the MCS. A simplified flowchart forthe MCS is presented in Figure 1. Given the topology of

    a network along with the link failure probabilities: 0) Asuccess counter is set to zero; 1) A random sampletopology is generated, by selecting those links in theoriginal network that are failed or operational; 2) theconnectivity of the generated network is assessed usinga minimum spanning procedure (Prim algorithm) [10].If all nodes are connected, the success counter is incre-mented; 3) Steps 1-2 are performed NSIM times. At theend, the ATR is estimated as the ratio of the successcounter and the number of topologies evaluated. Notethat MCS is based on the evaluation of NSIM indepen-dent network samples. Statistical theory indicates that

    the error for the ATR estimation is proportional toNSIM-1/2.The main objective of this paper is to improve the

    execution time of the MCS program. To this aim, twoactions are developed: a) Optimization of the original

    B. Otero Calvio is with the Department dArquitectura de Computa-

    dors, Universitat Politcnica de Catalunya, Barcelona TECH (UPC),

    08034, Spain. S. Pascual Martnez is with the Universitat Politcnica de Catalunya,

    Barcelona TECH (UPC), 08034, Spain.

    C. M. Rocco Sanseverino is with the Engineering Faculty, Universi-

    dad Central de Venezuela, Caracas, Venezuela.

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 11, NOVEMBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 1

  • 7/30/2019 Performance Optimization of a Monte Carlo Simulation Code for Estimating the All-Terminal Reliability of a Network

    2/7

    code; b) Parallel implementation of the MCS.For optimizing the performance of the original code,

    the sections that spend more CPU time are identified,using profiles tools, and optimization actions are pro-

    posed. As a result, an optimized MCS serial program

    (OMSP) is built.Additionally, a parallel implementation of OMSP,

    based on Message Passing Interface is proposed. In this

    case, the behavior of the parallel implementation is as-sessed, using others profile tools.

    Fig. 1. Basic MCS flowchart

    The paper is organized as follow: Section 2 de-

    scribes the computational example tested and the soft-ware tools used. Section 3 shows the optimizations per-

    formed and reports the execution time for OMSP. Sec-tion 4 presents a new parallel implementation of the

    code using MPI and shows the execution times ob-tained to estimate the ATR in one real network topolo-

    gy. Section 5 studies the behavior of the ATR estima-tion for both OMSP and parallel implementation. Fi-

    nally, the last section presents the conclusions and fu-ture work.

    2 CASESTUDYANDSOFTWARETOOLS

    In this section the network topology to be tested andthe software tools used to evaluate the performance ofthe program versions developed, are described. In allthe evaluations, an Intel Core i7-720QM with 8processors machine was used [18].

    2.1 Network

    The network used during all the evaluations to be pre-sented, corresponds to Austin transportation systemfor the morning peak-hour period. It is important tomention that only the network topology is used and nophysical characteristic of the transportation system ismodeled. This network is formed by 7388 nodes and18961 links [11]. It is assumed that all link failureprobabilities are defined as q=0.05. Figure 2 shows

    the corresponding network topology.

    2.2 Software tools

    The following subsections describe the software tools

    used to evaluate the performance of the new programversions developed.

    Fig. 2. Austin network topology

    2.2.1 Compilers

    Two compilers were used to generate the object code ofthe code under evaluation: gfortran [12] and ifort [13].The first compiler generates code for different comput-er architectures and operating systems while the sec-ond one is targeted for Intel architectures. The ifortcompiler includes implicit improvements on Intel ar-chitecture-based computers.

    2.2.2 Parallel programming model

    The new parallel version of the OC is implemented us-ing the message passing programming model. In thismodel, the parallel processes exchange data throughpassing messages between them. The standard usedfor data interchange is the Message-Passing Interface(MPI).

    MPI is a message-passing library interface specifica-tion to be used on parallel computers, with operationsexpressed as functions, subroutines, or methods, ac-cording to the appropriate language bindings [16].

    MPI can be found in multiple implementations. Theimplementation selected is an open source MPI calledOpenMPI that includes several options for specifying

    how to assign the MPI processes using the availableprocessors [17]. When a MPI program is launched, it ispossible to specify how many processes are going totake part in the MPI execution. In order to achieve afaster execution as many processes as available proces-sors need to be involved.

    2.2.3 Profile tools

    To improve the performance of the OC is necessary toidentify the code sections that consume more CPUtime. To localize these sections a profiling tool called

    gprof [14] is used.

    Gprofis a call graph execution profiler that creates acall graph detailing which functions in a serial pro-gram are called, and records the amount of time spentin each function. This type of profiling is useful for fig-

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 11, NOVEMBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 2

  • 7/30/2019 Performance Optimization of a Monte Carlo Simulation Code for Estimating the All-Terminal Reliability of a Network

    3/7

    uring out which functions must be optimized, eitherbecause they are called very often or because they takea significant execution time.

    To study the behavior of the parallel program, theperformance tools Extrae and Paraver are selected.These tools,developed by the Barcelona Supercomput-ing Center (BSC) [15], are able to assess the inter-nodecommunication traffic.

    3 OPTIMIZATIONS

    This section describes the optimizations performed inthe OC. Two types of optimizations are possible: thosethat are independent of the machine and those that aredependent. As a result, the performance of the opti-mized serial program is presented.

    3.1 Machine independent optimizations

    This kind of optimization does not depend on any fea-tures or characteristics of the target machine. Examples

    of code improvements included in this group are: con-stant folding, dead code elimination, and constant

    propagation. Because of their machine-independence,these code improvements are often applied to the high-

    level intermediate language representation of the pro-gram.

    A profiling evaluation was carried out on the origi-nal code using gprof. Table 1 shows which functions orsubroutines consume more CPU time when the ATR ofthe Austin network is estimated (using NSIM=10000).The Prim subroutine, which evaluates the network con-nectivity, is the most CPU-time consuming activity.

    TABLE 1GPROFREPORTOFTHEORIGINALCODE

    Each sample counts as 0.01 seconds.

    %

    time

    cumulative

    seconds

    self

    seconds

    total

    callss/call s/call name

    79.06 682.82 682.82 20000 0.03 0.04 Prim_

    10.33 772.05 89.23 20000 0.00 0.00 Quick_

    2.78 796.07 24.02 758440000 0.00 0.00 Remove_

    2.47 817.39 21.32 379220000 0.00 0.00 Ran_

    2.35 837.65 20.26 20000 0.00 0.01 Inter-

    face_Prim

    2.13 856.01 18.36 20000 0.00 0.00 Zero_

    0.95 864.18 8.17 1 8.17 864.36 MAIN_

    0.02 864.36 0.18 1 0.18 0.18 Setran_

    0.00 864.36 0.00 3 0.00 0.00 Clock_

    0.00 864.36 0.00 1 0.00 0.00 Time-

    stamp_

    By inspecting the code some redundancies were

    detected on this function and were removed in order to

    optimize the execution of the program. Figure 3 shows

    one of the modifications made in the Prim subroutine.

    In this figure the lines in blue represents the lines

    belonging to the original code. We introduce a new linethat considers only one call to the subroutine remove,

    which now includes a new parameter in itsspecification. This change reduces the number of

    subroutine calls in the Prim subroutine.

    Fig. 3. Code modification at Primsubroutine

    Additional changes were introduced in

    Interface_Prim subroutine, which is called before a Prim

    call is issued, as shown in Figure 4. In both structures

    if-else, the lines marked in blue appear in both if andelse cases, so the lines were extracted outside the if-

    else structure.

    Fig. 4. Code modification at Interface_Primsubroutine

    3.2 Machine dependent optimizations

    Machine-dependent optimizations require specific in-

    formation about the target machine. Sometimes it ispossible to parameterize some of these machine-de-

    pendent factors, so that a single piece of compiler code

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 11, NOVEMBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 3

  • 7/30/2019 Performance Optimization of a Monte Carlo Simulation Code for Estimating the All-Terminal Reliability of a Network

    4/7

    can be used to optimize different machines just by al-

    tering the machine description parameters.

    Compilers typically perform numerous optimiza-

    tions that can be selectively turned on or off and con-figured through command line flags. Loop unrolling,

    function in-lining, instruction scheduling, and otherloop optimizations are only some of the available con-

    figurable optimization options. Usually, the definitionof the right optimization level is sufficient, but some-

    times, inspection of the assembly code provides insightthat can be used to perform a better compiler optimiza-

    tion tuning.The OC was evaluated using different optimization

    flags for bothgfortran and ifort compilers. Both compil-ers include optimization flags that can optimize the ex-ecutable code using speed or size as criteria. The ifortcompiler includes additional flags that can help the

    programmer to optimize the code. For the OC, theflags guide parallel were selected. Through theseflags the Intel compiler provides a report with sugges-tions in order to parallelize some code zones includingonly some directives.

    At the end, the executable code includes all the ma-chine independent optimizations. The best optimiza-tion flag for the GNU compiler was O3 while the

    best choice for the Intel compiler was the fast flag.As expected, since an Intel based machine is used, the

    best execution times were obtained when the Intelcompiler was used.

    3.3 Improvement evaluation

    Figure 5 shows the simulation times obtained for eachcompiler to estimate the ATR of the Austin network(NSIM=10000). The lighter bar corresponds to the sim-ulation time of the original code compiled with theirdefault flags (gfortran compiler does not include anyoptimization level while the ifort compiler includes theoptimization level '-O2'). The darker bar corresponds tothe best performance obtained. For the gfortran compil-er, the generated code has the machine independentmodifications described in Section 3.1. The code gener-ated by ifort compiler also included the changes sug-

    gested by the set of flags '-guide-parallel'.

    Fig. 5. Simulation times obtained

    Figure 5 also shows the percentage of improvementobtained. Whilegfortran reduces the simulation time in65%, ifort only reduces it around 7%. The inclusion ofthe default flag O2 by ifort achieves a program al-most three times faster than the program generated by

    gfortran compiler (as previously mentioned, default set-tings do not include any optimization level). This factexplains the higher percentage reduction forgfortran.

    Table 2 shows the flags that were evaluated. As

    previously mentioned, the best performance for

    gfortran was obtained when the optimization flag '-O3'

    was used. For ifort, the best flag was '-fast'.

    TABLE 2OPTIMIZATIONFLAGSFOREACHCOMPILER

    Gfortran compiler Ifortcompiler

    -O0 -O0-O/-O1 -O1-O2 -O2/-O-O3 -O3

    -Ofast -fast-Os -Ofast

    4 PARALLEL IMPLEMENTATION

    This section describes the parallel implementation ofMCS code under study as well its performance for esti-mating the ATR of the Austin network.

    4.1 Description

    As previously described, the general MCS approach

    for assessing the ATR of a given network, consists onthe evaluation of the connectivity of NSIM possiblenetwork topologies (or samples), derived from the

    random and independent behavior of each of its links(failed or operational). In the parallel implementation,

    the NSIM samples are shared evenly among theavailable processors (p). For example, if p=4, thenNSIM/4 samples are generated and evaluated perprocessor. Every MPI process independently executes

    its own MCS without interfering with other processexecutions.

    The flow chart of the parallel program

    implemented (Figure 6) consists on the following steps:

    1. The program uses the MPI process identifiersto associate them to a role process. The

    program assigns one of the processes (the

    process which id is 0) as the master processwhile the others are defined as slave

    processes.

    2. All the slave processes wait for the data beforeexecuting their own simulation.

    3. The master process, after reading the input

    data (network topology, link failure

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 11, NOVEMBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 4

  • 7/30/2019 Performance Optimization of a Monte Carlo Simulation Code for Estimating the All-Terminal Reliability of a Network

    5/7

    probabilities, NSIM), sends it to all the slaveprocesses along with the corresponding seed

    for the pseudo-random generator.

    4. Every MPI processes, master and slaves,

    performs simultaneous and independent

    NSIM/p samples. As a result each processobtains its own counter of successfully

    samples (ioper).5. Every slave process sends its success counter

    to the master process.

    6. The master process, after obtaining all success

    counter (including its own success counter),

    estimates the ATR as the ratio between thetotal number of success counter and NSIM.

    7. All processes finalize their MPI sessions.

    Fig. 6. Flow diagram of parallel implementation

    In order to get statistically valid reliabilityestimations, each MPI process must use a different

    seed. Indeed, processes with the same seed wouldevaluate the same NSIM/p samples and the estimationof the reliability would be statistically dependent. To

    guarantee statistical independence, the MPI programsends a different seed to each process (see step 3).

    To achieve a faster version of the program, an

    inspection of the code was made for the purpose of

    finding sections where OpenMP could be used [19].Four sections of the code were selected to use OpenMP

    directives. Several processors distributions were testedsharing out the eight processors available between the

    MPI and OpenMP technologies. None of these

    configurations ever achieve a simulation time shorter

    than the MPI program with eight processes without

    OpenMP.

    4.2 Performance resultsThe ATR of the Austin network is estimated using gfor-tran and ifort compilers and considering one, two, fourand eight MPI processes (see Figure 7). The best simu-lation times (i.e., the time to complete the execution)were obtained using the -O3 flag forgfortran compilerand the -O3 -ipo flag for ifort. It is important to men-tion that MPI libraries do not allow the use of the ifortoptimization flag -fast, which achieved the best simula-tion time in the serial version of the code.

    As expected, the more processors available, thefaster the execution is. Nevertheless the percentages of

    improvement obtained were lower than expected. In-deed, a proportional improvement was expected.Figure 7 shows that using 8 MPI processes the sim-

    ulation time is 70% lower than the simulation time ofthe serial optimized implementation, using gfortranand 74% using ifort .

    Fig. 7. Simulation times comparison, using gfortranand ifort

    A study using the performance tools Extrae andParaver was also carried out. These tools can show the

    behavior of a parallel program by extracting info froman MPI execution. To understand the information pro-vided by these tools Figure 8 illustrates the simplestcase with two MPI processes.

    Figure 8 shows a scheme of the parallel programexecution which includes the following five steps:

    1. All the MPI processes obtain their identifica-tion.

    2. Process 0 sends all the data to the rest of pro-cesses.3. Every process carries out their NSIM/2 sam-

    ples, determining the number of successfullysamples.

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 11, NOVEMBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 5

  • 7/30/2019 Performance Optimization of a Monte Carlo Simulation Code for Estimating the All-Terminal Reliability of a Network

    6/7

    4. Every process sends its result (the number ofsuccessful samples) to process 0, which esti-mates the ATR with all the results.

    5. All the processes end their execution.

    Fig. 8. Steps with two MPI processes

    The final simulation time includes the time thatprocess 0 takes to perform steps 2, 3 and 4.

    Figure 9 shows the diagram output of Paraver fortwo samples (NSIM=2) of the Austin network for two

    MPI processes. Paraver shows the percentage of timeconsumed by every MPI function. In this case, the com-munication between the two processes (steps 2 and 4)took less time to carry out than the simulations (step3). For example, for NSIM=10000 samples, the total ex-ecution time is 89 secs and the communication time isonly 0.32 secs, that is, 0.35% of the total time. Thisassessment suggests that no additional parallel im-provements are required.

    Fig. 9. Paraveroutput: parallel execution with two processes

    5 ATR ESTIMATIONS

    In this section the procedure used to evaluate the qual-ity of the ATR estimation is presented. In the case ofthe serial program, the optimizations included in thecode did not change the ATR estimation obtained by

    the original code, using the same input parameters.However, the ATR estimation for the parallel version isdependent on the number of MPI processors used onthe execution. As previously mentioned, every MPIprocess uses a different seed for its pseudo-randomnumber generator, so the network samples generatedare not necessarily equal to those of the serial program.In order to verify that the number of MPI processorsnot degrade the ATR estimation, a test using 1, 2 4 and8 MPI processors was performed. Both serial and par-allel implementations were evaluated 50 times(NSIM=10000), and minimum-average-maximum sta-tistics for ATR estimations were recorded.

    Table 3 presents the ATR estimations obtainedwhen the MPI program was executed to evaluate theAustin network. The analysis of all the cases evaluatedreveals that similar estimates of ATR were obtained us-ing different number of processors. That means the

    number of MPI processes involved did not degrade theATR estimation.

    However, it is clear that the use of a parallel imple-mentation is able to reduce the execution time. Thismeans that a better ATR estimation could be achieved

    by increasing the number of samples NSIM and usingthe parallel implementation. For the network consid-ered, a rough estimate suggests that NSIM could be setto 10000*7.

    TABLE 3ATR ESTIMATIONSFORTHE AUSTIN NETWORK

    MPIProcesses

    Minimum Average Maximum

    1 2.99E-02 3.24E-02 3.79E-022 2.94E-02 3.24E-02 3.53E-024 2.97E-02 3.24E-02 3.53E-028 3.10E-02 3.25E-02 3.46E-02

    In a previous work [20], several additional networktopologies (with lower dimensions) were evaluated,with similar behaviors.

    CONCLUSION

    In this paper an improved version of an original Monte

    Carlo Simulation Fortran code for estimating the al-

    l-terminal reliability of a network topology is present-

    ed. Two different actions were proposed to enhance the

    performance. First, the original code performance was

    increased by: a) code changes such as the elimination

    of unnecessary instructions or variables and the reduc-

    tion of the number of calls to specific subroutines; and

    b) using several optimization flags available in the gfor-

    tan and ifort compilers. As a result, the CPU time re-

    duction obtained was 65% using gfortran and 7% usingifort.

    The second action consisted on a new parallel im-

    plementation of the optimized code developed in the

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 11, NOVEMBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 6

  • 7/30/2019 Performance Optimization of a Monte Carlo Simulation Code for Estimating the All-Terminal Reliability of a Network

    7/7

    previous step. This implementation reduced the execu-

    tion time up to 74% using the ifort compiler and 70%

    using the gfortran compiler. In a previous work [20],

    several additional network topologies (with lower di-

    mensions) were evaluated, with similar behaviors.

    This improvement means that, the number of samplesused to assess the ATR on the parallel implementation

    could be increased for a better ATR estimation.

    As a future work, it could be interesting to imple-ment other versions of parallel MCS, using additionalprogramming models (e.g., StarSs [21]) that exploitmore efficiently the resources available. Moreover, theevaluation of the original code found that the Prim al-gorithm (used for detecting spanning trees) and its re-lated functions consume considerable processing time.A parallel implementation would be possible, allowingfurther execution time reduction [22].

    ACKNOWLEDGMENT

    This work has been supported by the Spanish Ministryof Education (TIN2007-60625).

    REFERENCES

    [1] C. Srivaree-ratana, K. Abdullah, A.E. Smith, "Estimation ofAll-Terminal Network Reliability Using an Artificial Neural

    Network," Computers & Operations Research, no. 29 pp. 849-

    868, 2002.

    [2] D.L. Deeter, A.E. Smith, "Heuristic Optimization of Network

    Design Considering All-Terminal Reliability," Proc. of the Re-liability and Maintainability Symposium, pp. 194-49, 1997.

    [3] B. Dengiz, F. Altiparmak, A.E. Smith, "Efficient Optimization of All-Terminal Reliable Networks Using an Evolutionary Approach,"

    IEEE Transactions on Reliability, no. 46, pp. 18-26, 1997.

    [4] J.E. Ramirez-Marquez, C.M. Rocco, "All-terminal Network Reliabili-ty Optimization Via Probabilistic Solution Discovery," Reliability En-

    gineering and System Safety, no. 93, pp. 1689-1697, 2008.

    [5] K.K. Aggarwal, S. Rai, "Reliability Evaluation in Computer-Com-munication Networks," IEEE Transactions on Reliability, no. 30, pp.

    32-5, 1981.

    [6] S. Rai, "A Cutset Approach to Reliability Evaluation in Communi-

    cation Networks," IEEE Transactions on Reliability, no. 31, pp. 428-31,1982.

    [7] R.H. Jan, F. J. Hwang, S.T. Chen. "Topological optimization of acommunication network subject to a reliability constraint," IEEE

    Transactions on Reliability, no. 42, pp. 63-70, 1993.

    [8] G. S. Fishman. "A Monte Carlo sampling plan for estimating net-work reliability," Operations Research, no. 34, pp. 581-594, 1986.

    [9] C. Rocco. "Fortran Codes for Network Reliability Assessment". In-ternal Report DIOC IR2000-001 (In Spanish). Universidad Central de

    Venezuela, Facultad de Ingeniera.

    [10] E. Martins. Minimal Spanning Tree Algorithms. Fortran codes, avai-lable athttp://www.mat.uc.pt/~eqvm/cientificos/for-

    tran/codigos.html

    [11] H. Bar-Gera, Transportation Network Test Problems,http://www.bgu.ac.il/~bargera/tntp/

    [12] http://gcc.gnu.org/fortran[13] Intel Corporation, IntelFortran Compiler XE 12.1 User and Ref-

    erence Guides,

    http://software.intel.com/sites/products/docu-

    mentation/hpc/composerxe/en-us/2011Update/for-

    tran/win/index.htm, 2011.

    [14] S.L. Graham, P. Kessler, M. Mckusick, "Gprof: A call graphexecution profiler,"ACM Sigplan Noticies, June 1982.

    [15] Barcelona Supercomputing Center, Performance Tools,http://www.bsc.es/computer-sciences/performance-

    tools, 2011.

    [16] Message Passing Interface Forum, http://www.mpi-fo-rum.org/, 2009.

    [17] The Open MPI Development Team, Open MPI: Open Source HighPerformance Computing,http://www.open-mpi.org/, 2012.

    [18] http://ark.intel.com/products/43122[19] The OpenMP Architecture Review Board, The OpenMP API

    specification for parallel programming,

    http://openmp.org/wp/, 2012.

    [20]S. Pascual , B. Otero and C. M. Rocco, All-Terminal Reliability Evalu-

    ation through a Monte Carlo simulation based on an MPI

    implementation. PSAM 11 & ESREL 2012.

    [21] https://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstract

    [22] E. Gonina and L. V. Kal, Parallel Prim's algorithm on dense graphswith a novel extension., Technical Report. Department of Computer

    Science, University of Illinois at Urbana-Champaign. November

    2007.

    B. Otero Calviois an Assistant Professor at Universitat Politc-

    nica de Catalunya - Barcelona TECH (UPC). She received her

    MSc. and her first Ph.D. degrees in Computer Science from at

    Universidad Central de Venezuela in 1999 and 2006, respec-

    tively. After that, she received her second Ph. D. in Computer Ar-

    chitecture and Technology from the UPC in 2007. Her research

    interests include parallel programming, load balancing, cluster

    computing, and autonomic communications. She is member of

    HiPEAC Network of Excellence.

    S. Pascual Martnez received her Telecommunications Engi-

    neering degree from the Universitat Politcnica de Catalunya -

    Barcelona TECH (UPC) in 2012. Her research is primarily con-

    cerned with the design and implementation of system software

    for parallel computing, to improve their performance.

    C. M. Rocco Sanseverino received the Electrical Engineering and

    MSc. Electrical Engineering (Power System) degrees from Universidad

    Central de Venezuela (1980, 1982) and Ph.D. degree from The RobertGordon University, Aberdeen, Scotland, UK (2000). He is a Full Profes-

    sor at Universidad Central de Venezuela, currently at Operation Re-

    search post-graduate courses. His main areas of research interest are

    Statistics, Reliability, Evolutionary Multi-objective Optimization and Ma-

    chine Learning techniques. He has published more than 150 refereed

    manuscripts related to these areas in technical journals, book chapters,

    conference proceedings and industry reports. Member of the Editorial

    Boards of Reliability Engineering & System Safety, International Journal

    of Performability Engineering and Revista de la Facultad de Ingeniera,

    Universidad Central de Venezuela.

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 11, NOVEMBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 7

    http://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.htmlhttp://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.htmlhttp://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.htmlhttp://www.bgu.ac.il/~bargera/tntp/http://gcc.gnu.org/fortranhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://www.bsc.es/computer-sciences/performance-toolshttp://www.bsc.es/computer-sciences/performance-toolshttp://www.mpi-forum.org/http://www.mpi-forum.org/http://www.open-mpi.org/http://www.open-mpi.org/http://www.open-mpi.org/http://ark.intel.com/products/43122http://openmp.org/wp/https://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstracthttps://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstracthttp://gcc.gnu.org/fortranhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/index.htmhttp://www.bsc.es/computer-sciences/performance-toolshttp://www.bsc.es/computer-sciences/performance-toolshttp://www.mpi-forum.org/http://www.mpi-forum.org/http://www.open-mpi.org/http://ark.intel.com/products/43122http://openmp.org/wp/https://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstracthttps://www.bscmsrc.eu/media/events/barcelona-multicore-workshop-2010/jesus-labarta-abstracthttp://www.bgu.ac.il/~bargera/tntp/http://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.htmlhttp://www.mat.uc.pt/~eqvm/cientificos/fortran/codigos.html