parallel fault tree analysis for accurate reliability of...

13
Parallel fault tree analysis for accurate reliability of complex systems Fritz Sihombing, Marco Torbol Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea article info Article history: Received 27 March 2017 Received in revised form 7 November 2017 Accepted 12 December 2017 Keywords: GPGPU computing SIMD architecture Fault tree analysis Cut-sets analysis Probability of failure abstract Fault tree analysis is one of the methods for the probabilistic risk assessment of components and subsys- tems of nuclear power plants. The algorithms that solve a fault tree have been until now serial. Instead, this study presents new algorithms that handle and solve a fault tree by taking advantage of the new state of the art in parallel computing: general purpose graphic processor unit (GPGPU). The subsystems of nuclear power plants are the target of this study. However, the method can be used on many others, complex, engineering systems. The different, developed, parallel algorithms are: one builder, which assembles the topology matrix of the fault tree and leads the computation of the three, developed, new solvers. A bottom-up solver, a cut sets solver, and a Monte Carlo simulation solver. The probability of the top event, and the probabilities of each cut sets are computed. The results shows that, given the same investment, a GPU can handle larger fault trees than a CPU implementation. The developed solvers are the foundation of the next generation parallel algorithms for the tree-based analysis of complex systems. Ó 2017 Elsevier Ltd. All rights reserved. 1. Introduction Fault tree analysis (FTA) is one of the most powerful tool to rep- resent the failure probability of a complex engineering system based on the failure probabilities of its basic components. FTA was invented in 1961 at the Bells Labs, for the guidance system of the Minuteman project [1]. It was extended to the entire project in 1962 [2]; and, these days, FTA is used in many, different engi- neering fields: aerospace [3], nuclear power plants [4], off-shore platforms. In the nuclear industry, probabilistic risk assessment (PRA) studies [5]: the safety, the reliability, the risk, and the conse- quences of operating a nuclear power plant (NPP) [6]. Because a nuclear reactor system is supported, operated, and controlled by many different subsystems that differ between each other in both design, complexity, and scope, PRA of each subsystem is carried out using various, different methods. Each method has its own peculiarities and its own specific scope within the PRA framework. This study focuses on the ‘‘small event tree/large fault tree analy- sis” and assesses the risk and consequences of rare events, such as: earthquakes, tsunamis, and loss of coolant accident (LOCA). The consequences of a rare event on a NPP can be computed using a single large event tree. However, due to its complexity this is never done. Instead, the entire system is divided into its major subsystems, and the probability of failure of each subsystem is cal- culated using smaller fault trees. The fault trees of the different subsystem are linked together through a global event tree. Inside each fault tree a subsystem is decomposed in its basic events that are linked together through logic gates. The failure probability of every subsystems are used in the event tree to estimate the prob- ability of failures of the entire NPP. This technique is powerful and efficient, because the fault tree of each subsystem is built based on its design blueprints and it is independent from the other subsystems. A large fault tree includes thousands of gates, basic events, mul- tiple occurring events (MOE), and multiple occurring branches (MOB). Furthermore, different types of gates exist. Each gate has its own logic and number of inputs. The probability of the top event in a simple fault tree that does not include MOE and MOB can be computed using the bottom-up method. However, for more complex fault trees others solvers must be used, such as: cut sets analysis. Finally, for the largest fault trees, when the number of cut sets exceed the available computer memory, raw, brute force Monte Carlo simulation is used as the alternative approximation method. For example, [7] used field programmable gate arrays (FPGAs) to accelerate the solution of a fault tree. The accuracy and complexity of FTA in PRA are a challenge, but evolution in par- allel computing methods can overcome these issues. Parallel computing has been used in civil engineering for a range of different purposes. For example: [8] used it to study hys- teretic systems; [9] presented a visualization method for vibrations https://doi.org/10.1016/j.strusafe.2017.12.003 0167-4730/Ó 2017 Elsevier Ltd. All rights reserved. Corresponding author. E-mail address: [email protected] (M. Torbol). Structural Safety 72 (2018) 41–53 Contents lists available at ScienceDirect Structural Safety journal homepage: www.elsevier.com/locate/strusafe

Upload: others

Post on 21-Feb-2021

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

Structural Safety 72 (2018) 41–53

Contents lists available at ScienceDirect

Structural Safety

journal homepage: www.elsevier .com/locate /s t rusafe

Parallel fault tree analysis for accurate reliability of complex systems

https://doi.org/10.1016/j.strusafe.2017.12.0030167-4730/� 2017 Elsevier Ltd. All rights reserved.

⇑ Corresponding author.E-mail address: [email protected] (M. Torbol).

Fritz Sihombing, Marco Torbol ⇑Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history:Received 27 March 2017Received in revised form 7 November 2017Accepted 12 December 2017

Keywords:GPGPU computingSIMD architectureFault tree analysisCut-sets analysisProbability of failure

Fault tree analysis is one of the methods for the probabilistic risk assessment of components and subsys-tems of nuclear power plants. The algorithms that solve a fault tree have been until now serial. Instead,this study presents new algorithms that handle and solve a fault tree by taking advantage of the newstate of the art in parallel computing: general purpose graphic processor unit (GPGPU). The subsystemsof nuclear power plants are the target of this study. However, the method can be used on many others,complex, engineering systems. The different, developed, parallel algorithms are: one builder, whichassembles the topology matrix of the fault tree and leads the computation of the three, developed,new solvers. A bottom-up solver, a cut sets solver, and a Monte Carlo simulation solver. The probabilityof the top event, and the probabilities of each cut sets are computed. The results shows that, given thesame investment, a GPU can handle larger fault trees than a CPU implementation. The developed solversare the foundation of the next generation parallel algorithms for the tree-based analysis of complexsystems.

� 2017 Elsevier Ltd. All rights reserved.

1. Introduction

Fault tree analysis (FTA) is one of the most powerful tool to rep-resent the failure probability of a complex engineering systembased on the failure probabilities of its basic components. FTAwas invented in 1961 at the Bells Labs, for the guidance systemof the Minuteman project [1]. It was extended to the entire projectin 1962 [2]; and, these days, FTA is used in many, different engi-neering fields: aerospace [3], nuclear power plants [4], off-shoreplatforms.

In the nuclear industry, probabilistic risk assessment (PRA)studies [5]: the safety, the reliability, the risk, and the conse-quences of operating a nuclear power plant (NPP) [6]. Because anuclear reactor system is supported, operated, and controlled bymany different subsystems that differ between each other in bothdesign, complexity, and scope, PRA of each subsystem is carriedout using various, different methods. Each method has its ownpeculiarities and its own specific scope within the PRA framework.This study focuses on the ‘‘small event tree/large fault tree analy-sis” and assesses the risk and consequences of rare events, suchas: earthquakes, tsunamis, and loss of coolant accident (LOCA).

The consequences of a rare event on a NPP can be computedusing a single large event tree. However, due to its complexity thisis never done. Instead, the entire system is divided into its major

subsystems, and the probability of failure of each subsystem is cal-culated using smaller fault trees. The fault trees of the differentsubsystem are linked together through a global event tree. Insideeach fault tree a subsystem is decomposed in its basic events thatare linked together through logic gates. The failure probability ofevery subsystems are used in the event tree to estimate the prob-ability of failures of the entire NPP. This technique is powerful andefficient, because the fault tree of each subsystem is built based onits design blueprints and it is independent from the othersubsystems.

A large fault tree includes thousands of gates, basic events, mul-tiple occurring events (MOE), and multiple occurring branches(MOB). Furthermore, different types of gates exist. Each gate hasits own logic and number of inputs. The probability of the topevent in a simple fault tree that does not include MOE and MOBcan be computed using the bottom-up method. However, for morecomplex fault trees others solvers must be used, such as: cut setsanalysis. Finally, for the largest fault trees, when the number ofcut sets exceed the available computer memory, raw, brute forceMonte Carlo simulation is used as the alternative approximationmethod. For example, [7] used field programmable gate arrays(FPGAs) to accelerate the solution of a fault tree. The accuracyand complexity of FTA in PRA are a challenge, but evolution in par-allel computing methods can overcome these issues.

Parallel computing has been used in civil engineering for arange of different purposes. For example: [8] used it to study hys-teretic systems; [9] presented a visualization method for vibrations

Page 2: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

42 F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53

in concrete elements; [10] used parallel computing for high perfor-mance large scale analysis of structures; [10–13] investigated theuse of parallel computing to solve design optimization problems,structural analysis problems, and monitoring and controlproblems.

General Purpose Graphic Processing Unit (GPGPU) is the latestadvance in computational power. The use of GPGPU for engineer-ing problems has covered a wide range of research fields. Lopeset al. [14] showed the performance and benefit of GPU computingwhen it is applied to neural networks. Velez et al. [15] used GPU toaccelerate their shotcrete application modeling. Torbol [16] used itto accelerate the frequency domain decomposition algorithm forreal time structural health monitoring. Calado Lopes and Salea Dias[17] used it for visual simulation in location-based mobile comput-ing. Tsai and Huang [18] used it for parallelizing traffic sign inven-tory of video log images. Carozza et al. [19] used it for marker-lessvision-based augmented reality. Sihombing and Torbol [20] used itto estimate analytical fragility curve of tsunami hazard.

Researchers also attempted to improve PRA using GPGPU. Thefirst attempt was done by Aghassi and Aghassi [21] who developeda GPU based fault tree analysis solver using Monte Carlo simulationand time-to-failure. However, porting existing fault tree solvers tothe GPU environment is not simple, because this new parallelarchitecture works in a completely different way compare to aCPU.

This study proposed new and innovative parallel fault tree anal-ysis algorithms that were designed on the knowledge acquired onserial algorithms and the new innovative workflows of a GPU. Thenew gate expansion concept is developed and implemented to cre-ate a new representation of fault tree models that is suitable forparallel computation on a GPU. GPU is used as the computationaltools to overcome the current limitations of the serial cut sets anal-ysis and Monte Carlo simulations. The paper is organized as fol-lows. Section 2 refreshes the key concepts of FTA. Section 3presents the proposed parallel FTA developed for GPGPU comput-ing. Section 4 shows the results, which includes accuracy tests andNP-complex tests. Section 5 summarize the conclusion of thisstudy.

2. Fault tree analysis

Fault tree analysis is used to compute the probability of failureof a complex system. There are three major steps (Fig. 1): step 1,the analyst builds the fault tree logic based on the system design,he assigns the failure probability of the basic components and

Fig. 1. Fault tree analysis steps and algorithms.

decides the logic of every gate; step 2, the complete mathematicalmodel of the fault tree is built; step 3, the failure probability of topevent is computed.

In the second step, the fault tree is converted to a mathematicalmodel. Many models has been developed over the years. The top tobottom algorithm is the oldest one: it was invented in the originalMinuteman project. The algorithm builds a mathematical repre-sentation using a top-to-bottom approach, and computes the prob-ability of the top event using a bottom-to-top gate-to-gate. Thecut-sets algorithm is a more complex representation, because itcomputes not only the probability of the top event, but also theprobability of all cut-sets that causes the top events. A cut-sets isa combination of basic events that causes the top event. A minimalcut-sets is a cut-set that causes the top event to occur, but, if abasic event is removed from it, the top event does not occur any-more, i.e. it is not a cut-sets anymore. The binary decision diagram(BDD) is another model for fault tree analysis. BDD was importedfrom electrical engineering circuit design. The fault tree is con-verted to a series of if-then-else (ite) statements and each state-ment evaluates the occurrence of a basic event. Last zerosuppressed binary decision diagram (ZBDD) is an improved, com-pact version of BDD.

In the third step, there are several different solvers to find theprobability of the top event: bottom-up, gate to gate computation,direct analytical calculation of the cut-sets, Monte Carlosimulation.

All these existing algorithms are serial and hard to port on aGPU. However, they can be used as the baseline on which thenew parallel algorithms are developed.

3. Parallel fault tree analysis

The proposed parallel method follows the classic steps of faulttree analysis: a builder, a solver, and the post-processing of theresults. The advantage of using GPU parallel algorithm instead ofCPU is in the cost/computational power ratio. It is possible toachieve large scale multi CPU performance in a desktop orworkstation. Furthermore, it is much easier to implementusing CUDA C or OPEN CL without requiring OpenMP, OpenMPIimplementation.

The topology of the fault tree, i.e. the mathematical model, isbuilt top-to-bottom, and all logic gates and basic events are storedin a single matrix: the topology matrix T. This study introduces anew method for the arrangement of the logic gates: the gateexpansion. The gate expansion is performed on the original faulttree and ensures that each gate in the final fault tree has onlytwo inputs.

Once the topology matrix is known, the probability of occur-rence of the top event is computed using different algorithms: par-allel bottom-up algorithm, parallel cut-sets analysis, parallelMonte Carlo simulation. Each algorithm is aimed at a different faulttree based on its size and its logic properties, for example: thepresence of MOB, MOE, or common cause failure (CCF).

3.1. GPU architecture

A GPU is a massive parallel processing unit with thousands ofcores based on single instruction multiple data (SIMD) architec-ture. It is capable of handling large amount of data in parallel.For example, vector-vector, vector–matrix, and matrix–matrixoperations are well suited for GPU computing. GPUs are efficientat handling for-loops over large amount of data. The loops areeffectively unrolled, and the passes over the loop are executed bymultiple threads simultaneously and in parallel. Instead, if a GPUmust handle if-statements, issues arise, because an if-statement

Page 3: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

Fig. 2. Topology matrix development.

Fig. 3. Example of gate to gate placement.

F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53 43

causes branches in the execution that reduce the parallelism of theentire algorithm. If multiple threads enter an if-statement togetherand select between two outcomes, every threads must solve bothits outcomes. This reduces the parallelism. Furthermore, eachthread must execute a similar amount of work. If each threadsolves a logic gate, the complexity of all gates must be the same.Therefore, the new algorithms that are developed to solve faulttrees using a GPU must avoid if-statements and must handle logicgates of similar size and complexity. As proposed in this study, theimportance of the proposed gate expansion lies on it balancing thework load on the GPGPU.

GPU architecture can be summarized by the followingterminology:

– A thread is the basic worker inside a GPU program it is theequivalent of a process on the CPU.

– A warp is a group of threads executing concurrently, the num-ber is based on the specific architecture. 32 threads/warp isused in almost all existing architectures up to now.

– A block of threads usually contains between 32 and 1024threads. The threads in a block are divided in multiple warpand executes at different time, but they can all access the sameshared memory to exchange data.

– The global memory is the RAM memory of the GPU. Data has tobe moved here before any operations is performed. Arrays andvariables residing on this memory use the identifier ‘‘_dev” inthe pseudo code of this study

– The shared memory has higher speed than the global memoryand it is shared by all threads in the same block. Arrays andvariables residing in this memory use the identifier ‘‘_sha”.

– The register memory is on chip memory with the same speed ofthe GPU, each thread has its own register memory. Variables inthis memory use the identifier ‘‘_reg”. There are ways toexchange the register memory among the threads in the samewarp, but this is outside the scope of this study.

New algorithms for the FTA builder and representation arerequired. These algorithms will generate suitable matrix form forthe parallel FTA solvers. The parallel FTA solvers will run thousandsof concurrent processes.

3.2. Fault tree builder

The builder stores the fault tree inside a single topology matrixT[m,n]. The topology matrix is used to drive all solver algorithmsthrough the logic of the tree. For example, the bottom-up solverfills a probability matrix P[m,n] the same size of the topologymatrix. The last row of P contains the probabilities of failure ofthe basic events, and P[0,0] will contain the probability of thetop event. The topology matrix T contains all the informationregarding the logic of the fault tree, and the probability matrix Pcontains all the probabilities of failure of every basic event, of everygate, and of the top event. These matrixes have the same size interm of number of rows m and number of columns m, but differentmemory occupancy. The topology matrix is integers (int3), whilethe probability matrix is single precision floating point numbers(float), or double precision floating point numbers (double). Fig. 2shows a sample topology matrix.

The number of row m of the matrixes is equal to the depth ofthe fault tree, from the top event to the most nested basic event.The number of column n is at least equal to the number of basicevents that appear in the fault tree. n is always bigger, because itmust take into account the presence of the multiple occurringevents (MOE) and multiple occurring branches (MOB), whichincrease the number of occurrences of a single basic eventthroughout the tree. Bothm and n are initially unknown. The initial

n value is double the number of basic events, and the initial mvalue is equal to 256. When necessary, these values are increasedto accommodate a larger fault tree. They are also reduced to theirfinal value once the entire tree has been allocated inside the topol-ogy matrix.

The topology matrix is int3 so each position has 3 integer.T[].x contains the gate unique id, T[].y contains the gate logic type:1 = AND gate, 2 = OR gate, 3 = NAND, and 4 = NOR, �1 = basicevents. For a gate in row i and column j, its inputs are in rowi + 1. The first input is on the same column j, the second input ison the first available column on the same row i + 1. The columnid is stored in T[].z. The builder algorithm places the top event atT[0,0] (Fig. 3). Then, it runs for 0 < i < m and it assembles the topol-ogy matrix from top to bottom. At each pass it checks the entirerow i and it adds the inputs of every gate in the following rowi + 1. If one row include only basic events the algorithm terminates,and compute the exact value of m and n. The topology matrix T iscomplete.

If i = m and there are still logic gates in row i, m is doubled. Ifrow i + 1 is not large enough to store all the inputs of row i, the sizen of the topology matrix is doubled. Once the entire fault tree isplaced inside, the topology matrix m is reduced to the minimumnumber of row that can contain the matrix and n is padded tothe smallest multiple of the size of the block of threads used(bx), which in this study is set at 256. This algorithm runs in par-allel on a GPU, because all logic gates in row i can be resolvedsimultaneously. Given n and bx, which are respectively the numberof columns and the size of a block of threads, to solve each row n/bxnumber of block of threads are launched in parallel. The pseudocode for launching the kernel of the builder algorithm is shownbelow.

Page 4: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

Fig. 4. Sample expansion of AND and OR gate with multiple inputs.

44 F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53

where PlaceTopEvent is the kernel call to place the top event insidethe topology matrix. BuildToptBottom is the kernel call to read rowi and place all inputs of its gates in row i + 1. topology_dev is thetopology matrix array, gatelogic_dev is the input array that containsall logic gates, including the expanded gates. gridsize and blocksizecontains the number of block of threads and the size of a block ofthreads launched by each kernel call.

The topology matrix was designed for efficient access and read-ing by multiple threads. Each row i is related only to the previousrow i�1 and the following row i + 1. Row i can be divided in anynumber of blocks of threads of any size based on the specificGPU architecture without affecting the results. Once the fault treeis stored in the topology matrix the probability of failure of the topevent can be computed using the different parallel solvers that arepresented in this study. The ‘‘bottom-up” algorithm is straightfor-ward, but, if MOB, MOE, or M/N gates are presents, it underesti-mates the probability of failure of the top event (Section 3.3.1).The ‘‘cut sets analysis” algorithm is accurate and efficient, but itis limited by the available global memory to store all cut sets (Sec-tion 3.3.2). Storing the cut-sets on the hard drive is inefficient. Ifthe cut sets analysis algorithm runs out of memory, ‘‘Monte Carlosimulation” algorithm is the last resort. Monte Carlo simulationtakes advantage of the full computational power of GPGPU(Section 3.3.3).

3.2.1. Gate expansionGPU architecture achieves its best performance, when all

threads in a warp execute the same code, and each thread has itsown inputs and outputs: SIMD architecture. Good examples of effi-cient code are: the sum of two vectors, or the multiplication of twomatrixes. Bad examples are: if-then-else statements, and each con-ditional that is based on the input data and causes branching in thecode execution. Classical fault tree solvers, where a lot of if-then-else statements are used, different complex gate types are presentswith a variable number of inputs, are poor examples. BDD andZBDD algorithm are composed of if-then-else statements. If eachthread computes the probability of a single gate, and differentthreads in the same warp, computes different gate types with dif-ferent number of inputs, the workload is highly unbalanced. Thesecause respectively branching in the code and heterogeneous work-loads. Gate expansion was created to transform the original faulttree in a larger one with homogeneous workload among all gates.

Gate expansion is performed before the assembly of the topol-ogy matrix. The algorithm modifies each gate within a fault treeto a combination of simple AND gates and OR gates with twoinputs each. The overall number of gates and the size of the faulttree increases, but the amount of global memory is not a limitationnowadays. Instead, the logic of the entire tree is simplified. Thisalso increases the time requires to build topology matrix. However,the builder is a fraction of the total computational effort involved.Furthermore, the benefits to the solvers out weights this prepro-cessing cost. The solver is the bottleneck in the computation. Thebuilding time is orders of magnitude smaller than the solving time.Especially when Monte Carlo simulation is used.

Fig. 4 show how AND gate and OR gate with multiple inputs areconverted to multiple gates with only two inputs each.

The probability of the AND gate (Fig. 4a) is:

PG1 ¼ PA � PB � PC ð1ÞThe gate is converted to two AND gates and its probability is

computed in two successive steps:

PGg1 ¼ Pb � Pc ð2Þ

PG1 ¼ Pa � PGg1 ð3ÞThe OR gate in Fig. 4b has 3 inputs and its probability is:

PG1 ¼ Pa þ Pb þ Pc � ðPa � PbÞ � ðPb � PcÞ � ðPa � PcÞ þ ðPa � Pb � PcÞð4Þ

The general formula to solve OR gate with n inputs is given by[22]:

PG1 ¼ ðX

1st termsÞ � ðX

2nd termsÞþ ð

X3rd termsÞ; . . . ;�ð

Xnth termsÞ with n even

ð5ÞThis formula is already long for a 3 inputs gate, it becomes hard

to handle for increasing number of inputs, and, it is not efficient tosolve it in parallel. Instead, the gate expansion converts an OR gatewith n inputs in n�1 OR gate with 2 inputs each. The probability iscomputed in n�1 successive steps:

Page 5: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

Fig. 5. Expansion of other types of gate.

F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53 45

PGg1 ¼ Pb þ Pc � ðPb � PcÞ ð6Þ

PG1 ¼ Pa þ PGg1 � ðPa � PGg1Þ ð7ÞThis formulation is easy to implement on parallel architecture

with concurrent threads, because each threads has the same work-load. It read two inputs, it performed some mathematical opera-tions, and it return one output. Threads in the same warp haveonly one if-then-else statement AND gate/OR gate, and need only2 passes to solve a level of the fault tree. On the first pass, allthreads follow the AND gate branch. The AND gate threads solvetheir gate, and the OR gate threads are idle. On the second pass,the AND gate threads are idle, and the OR gate threads solve theirgate. Fig. 5 shows how all other, most common logic gates that arepresent in a fault tree are converted to the basic gate used in thisstudy: NAND gate (Fig. 5a), NOR gate (Fig. 5b), M/N gate (Fig. 5c),exclusive OR gate (Fig. 5d). It can be proven that the probabilitycomputed for the original gate is the same by using simple logicoperations.

It must be mention that, when NOR and NAND gates are con-verted, the 1-P value is never computed, unless the input is a basicevent. Because, if the input is a logic gate, the 1-P feature of his par-ent gate is passed to its children inputs. This is a must, because, asseen in the next chapter, the probability of failure are stored in logscale to represent very small numbers without overflowing thecontainer, either float or double.

3.2.2. Log scale probabilityIn reliability analysis, the probability of failures of the basic

components are small numbers, and, in fault tree analysis, eachgate multiplies these probabilities together to generate even smal-ler numbers. A computer represents a floating point number aseither single precision (float) or double precision (double), butthese are, sometimes, not enough to represent these small proba-bilities. To overcome this problem the probabilities are representedin log scale to enhance the accuracy [23]. If even the log scale is notenough, the number is approximate to zero. This limit is 2�126 �1.17549435�10�38.

Both multiplication and summation formula are necessary tosolve AND and OR gate. The multiplication of two probability P1and P2 in log scale is given by the formula:

lnðPG1Þ ¼ lnðPE1Þ þ lnðPE2Þ ð8ÞEq. (8) is straightforward and does not raise any issue in either

linear or log scale. Instead, the derivation of summation of twoprobability P1 and P2 in natural log scale is a more complicatedissue. The equation are written below:

lnðPG1Þ ¼ lnðPE1 þ PE2Þ

lnðPG1Þ ¼ lnðPE1 þ PE1 � PE2=PE1Þ

lnðPG1Þ ¼ lnðPE1 � ð1þ expðlnðPE2Þ � lnðPE1ÞÞÞÞ

lnðPG1Þ ¼ lnðPE1 þ PE1 � expðlnðPE2=PE1ÞÞÞ

lnðPG1Þ ¼ lnðPE1Þ þ lnð1þ expðlnðPE2Þ � lnðPE1ÞÞÞTherefore the summation of two probabilities P1 and P2 in nat-

ural log scale is given by the formula can be written as:

lnðPG1Þ ¼ maxðPE1; PE2Þ þ lnð1þ exp½minðPE1; PE2Þ �maxðPE1; PE2Þ�Þð9Þ

where max and min are the log of the largest and the smallest prob-abilities among the two.

Eq. (8) is used to compute the probability of an AND gate withtwo only inputs.

Eqs. (8) and (9) are used to compute the probability of an ORgate. Eq. (9) is expanded to Eqs. (10)–(12)

lnðPG1Þ ¼ maxðPtemp1; Ptemp2Þ þ lnð1� exp½minðPtemp1; Ptemp2Þ�maxðPtemp1; Ptemp2Þ�Þ ð10Þ

Page 6: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

46 F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53

lnðPtemp1Þ ¼ lnðPE1Þ � lnðPE2Þ ð11Þ

lnðPtemp2Þ¼maxðPE1;PE2Þþ lnð1þexp½minðPE1;PE2Þ�maxðPE1;PE2Þ�Þð12Þ

I must be stated that the second factor of Eqs. (10)–(12) canoverflow. Because the equation uses exp(P), they bring the repre-sentation of the probability P back from the log scale to the linearscale. If is smaller than 1.17549435�10�38, overflow will happen.However, using the max and min functions ensures that the com-puted probability is at least more than the max of the two proba-bilities. Double precision increases this range considerably, and,nowadays, GPU computation in double precision is available.

When working with NAND or NOR gate the 1-P1 is computedusing Eq. (13):

lnð1� P1Þ ¼ lnð1� exp½P1�Þ ð13ÞIt must be pointed out that Eq. (13) is used only on the proba-

bilities of the basic events, because every NAND and NOR gate inthe tree has is 1-P feature cascaded down at the bottom of thetopology matrix where only basic events are presents. This equa-tion can overflow only if the initial probability of failure of thebasic events is: already given in log scale, and it overflows whenexp(P) is executed. Such a case never present itself, because the ini-tial probably of failure of basic events is given in the linear scale.

Below is shown the source code used to solve the gate logic dur-ing the computation of the probabilities of the fault tree for theAND and OR gate.

The computation of the probability of an OR gate is more expen-sive than an AND gate in term of number of operations and numberof temporary variables than must be stored, but all variables arestored in the register memory, which is the fastest on chip memoryavailable. Furthermore, because gate expansion is used, these linesof code are enough to compute the probability of the entirefault tree without ever using more complicated equation, suchas: Eq. (1) or Eq. (5).

Fig. 6. Blocks of threads in the bottom-up algorithm.

3.3. Fault tree solver

3.3.1. Bottom-up solverThe bottom-up algorithm is the simplest algorithm that com-

putes the probability of failure of the top event. It starts at the bot-tom of the tree and compute the probability of failure of each gatebased on its logic and the probabilities of its inputs. It movesupward and solves the tree at each level until the top event isreached. The limitations of this method is that it does not take intoaccount MOB, MOE. Furthermore, the same problem arises when

M/N gates are expanded. If a basic event appear multiple time indifferent places in the fault tree, the estimated probability of fail-ure is incorrect.

The parallel bottom-up algorithm for fault tree solver followsthe topology matrix and creates the probability matrix P[m,n](float) to store the probabilities of failure of every element in thetopology matrix: basic events, logic gates, and the top event. Thealgorithm starts at i = m, which is the last row of the topologymatrix and contains only basic events. At i = m the algorithm placesthe probability of the basic events in the last row of the probabilitymatrix P. Then, for m�1 > i � 0 the algorithm reads the topologymatrix row i and it computes the probability of each gatebased on its inputs in i + 1, the result is stored in the probabilitymatrix P.

On a GPU this algorithm is efficient. The size of a block ofthreads and the number of blocks are a function of the propertiesof the GPU used. A ‘‘for loop” is used to run the kernel row byrow from bottom to top and a kernel is launched with enoughthreads and blocks to solve the entire row in one pass. Fig. 6 showshow the threads are divided for a large fault tree. In this example igoes from 127 to 0, the size of a block of threads is 256 and thenumber of blocks is 48.

3.3.2. Cut-sets solverWhen the fault tree includes MOE and/or MOB, the cut-sets sol-

ver must be used. The most common method to obtain the cut setsis the MOCUS algorithm (method of obtaining cut sets) [24].MOCUS uses a top-down approach. It starts with a single cut setthat includes only the top event and move downward the treeone level at a time. It replaces each gate in the cut set with itsinputs. When it replaces an AND gate, the size of the cut setincreases. Instead, when it replaces an OR gate, the number ofcut sets increases. When the cut sets contain only basic events,the algorithm terminates. All cut sets are, then, compared, andthe non-minimal cut sets are removed (Fig. 7).

The MOCUS algorithm is simple, but serial. If it is ported to aGPU without any modifications, it is not possible to predetermine

Page 7: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

Fig. 7. MOCUS algorithm examples.

Fig. 8. Cut-sets algorithm.

F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53 47

the number of blocks and the size of each block of threads. Instead,this study presents a bottom-up approach to accommodate theproperties and the limitations of GPU computing.

When a bottom-up approach is used, the initial number ofthreads is equal to the number of basic events in the last row mof the topology matrix, and the number of initial cut sets are alsoequal to the number of basic events in row m. The bottom-upapproach is similar to a reduction, which can be optimized in par-allel architecture. Fig. 8 shows how the same examples in Fig. 7 aresolved by the new, proposed parallel algorithm. The topologymatrix T is read bottom-up, and a v[s] vector is used to track thegate that currently ‘‘owns” the cut set. C[s,r] is the matrix wherethe basic events that are part of the cut set are stored. r is equalto the number of basic events, s is equal to the number of cut sets.

The rules of this new, parallel algorithm are different from therules of the MOCUS serial algorithm. All logic gates in the topologymatrix have only 2 inputs. An OR gate will not increase the numberof cut sets. For example, an OR gate owns 3 cut sets from the firstinput and 3 cut sets from the second input its output are the sumsof the inputs, i.e. 6 cut sets. Instead, an AND gate increases thenumber of cut sets. The number of output cut sets of an AND gateis equal to the multiplication of the number of the cut sets from thefirst input and the number of cut sets from the second input. Forexample, if an AND gate ‘‘owns” 3 cut sets from the first inputand 3 cut set from the second input, its output will be 9 cut sets,which are all the possible combinations between the two inputs.The algorithm iterates through the four steps m time, where m isthe number of rows in the topology matrix. 3 matrixes and 2 vectorare used.

– The topology matrix T[m,n] (int3).– The cut sets matrix C[si,r]i (bool) that contain the si cut sets atiteration i.

– The cut sets matrix C[si+1,r]i+1 (bool) that contain the si+1 cutsets at iteration i + 1.

– The vector v[si] (int3) that contain the gate solved at iteration i,which is the ‘‘owner” of the cut set at i.

– The vector v[si+1] (int3) that contain the gate solved at iterationi + 1, which is the ‘‘owner” of the cut set at i + 1.

The steps to compute the cut sets are:

– step 1: substitute the inputs in v[si+1] at row i + 1 in the topol-ogy matrix with their gates at row i, and identify if the outputgate is an AND gate or an OR gate.

– step 2a: if the output gate is an OR gate no other action arerequired. Copy the cut set ID from v[si+1] to v[si].

– step 2b: if the output gate is an AND gate, all possible combina-tions between the cut sets of the first input and of the secondinput must be computed and stored in v[si]. si will be greaterthan si+1 because of the combinatorial property of the AND gate.

Page 8: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

48 F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53

– step 3: Read the entire v[si]. If the element is an OR gate, copythe corresponding cut sets row from C[si+1,r]i+1 to C[si,r]i. If theelement is an AND gate, the two cut sets inputs of the combina-tion stored in the corresponding rows C[si,r]i are superimposedin C[si+1,r]i+1. Because the cut sets matrixes are bool, the super-position effect can be used. If a basic events is already part of acut sets, its index is already set to 1. Setting the index to 1 againdoes not change the output. This, also, takes into account MOBand MOE.

– step 4: This step is performed only when necessary. If the num-ber of cut sets grows too large and the algorithm runs out of glo-bal memory, The probability of failure of each cut sets, i.e. eachrow C[si,r]i, is computed, and, if it is less than the machine pre-cision, the cut set is eliminated. A procedure called ‘‘inclusionexclusion approximation” is employed in the original MOCUSalgorithm to handle the same memory problem. The furtherdevelopment of these cut sets will only further increase thenumber of basic events in the cut sets and decreases their prob-abilities of failure. Given the amount of memory available to amodern GPU and the boolean nature of the cut set matrix C, thisis required only for the largest fault trees. In this case MonteCarlo simulation is the only way to compute an accurate prob-ability of failure of the top event taking into account MOB, MOE,and M/N gate expansion.

When i = 0, the top event has been reached, the vector v[s0] con-tains only the top event, and C[s0,r]0 contains all the cut sets of thefault tree: minimal and non-minimal. All rows of C[s0,r]0 are com-pared between each other and the non-minimal cut sets areremoved. This process is carried out in parallel and it is efficientdue to the simplicity of the matrix C[s0,r]0. At most s02–s0 compar-

Fig. 9. Cut-sets probab

isons between cut sets are performed, if all cut sets are alreadyminimal.

The probability of the top events is computed by performing anAND gate between all the basic events within a cut sets, and byperforming an OR gate between the probabilities of all minimalcut sets, Fig. 9. This final probability is also subject to overflow,every cut sets with probability less than 10–38 will give zero con-tribution to the final probability of occurrence of the top event.However, it is still possible to see the basic events that cause fail-ure of the specific cut set.

The probability of occurrence of the top event is computed withfour parallel reductions:

– step 1: given the matrix C[smin,r]min, for each cut sets 1 + r/bxblocks of threads of size bx, are used to perform the AND gatereduction within a block of threads of basic events.

– step 2: a single block of thread of size 1 + r/bx is used to performthe AND gate reduction between blocks, which compute theprobability of occurrence of the cut set.

– step 3: 1 + smin/bx blocks of threads of size bx are used to per-form the OR gate reduction within a block of cut sets.

– step 4: a single block of thread of size 1 + smin/bx is used to per-form the OR gate reduction between blocks, which compute theprobability of failure of the top event.

The probability computed is the sum of the 1st and 2nd orderterms, which is the best achievable result without comparingeach single basic event within each different minimal cut sets.The sum of the 1st and 2nd terms is a good approximation ofthe exact solution. The pseudo code of the algorithm used isshown below:

ility computation.

Page 9: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53 49

3.3.3. Monte Carlo methodMonte Carlo simulation is the third parallel solver to compute

the probability of the top event. Monte Carlo simulation is the onlyalgorithm that computes the probability of the top event a faulttree without regards to its complexity. It works, even if the faulttree includes MOB, MOE, and expanded M/N gates, which are theissue for bottom-up algorithm. It works, even if the number ofcut sets that must be computed exceed the available global mem-ory. Although in this second case we could start storing partial cutsets on the hard disk but this slow down the calculation consider-ably. Fig. 10 shows the Monte Carlo solver steps and matrixinvolved. The number of Monte Carlo simulations is based on theaccuracy that is required on the probability of the top events. Itis not a function of the complexity of the fault tree or the presenceof M/N gate, MOB, and MOE.

The kth Monte Carlo simulation goes through five steps:

– Step 1: it generates a vector of r random numbers between 0and 1, where r is the number of basic events

– Step 2: it generates the state vector by comparing the probabil-ity of failure of the basic events vector with the vector of ran-dom numbers.

– Step 3: the values in the state vector are inserted at the bottomof a bool matrix with the same size of the topology matrix. Onlyrow i and row i + 1 must be tracked and are allocated in the glo-bal memory. The matrix can be reduced to a single vector if oneis not interested in computing the probability of failure of speci-fic logic gate.

– Step 4: a bottom-up approach is used to solve the fault tree.AND gate return 1 if both inputs are 1, OR gate return 1 if eitherof two inputs is 1. This step is similar to the bottom-up algo-rithm, but the operations are on bool values rather than floatingpoint numbers in log scale.

Fig. 10. Monte Car

– Step 5: the outcome of the top event of each simulation is usedto update the overall probability of failure of the top event.

The simulations are independent from each other because theinputs of a Monte Carlo simulation are the vector of the probabilityof failure of the basic events and the topology matrix T. Therefore,it is possible to run the Monte Carlo simulations on multi GPU sys-tems. If x Monte Carlo simulations take t time to finish on a singleGPU on y GPU the time is reduced to t/y + o, where o is the costoverhead of coordinating multiple GPU. The overhead timeincludes only the cost of copying the topology matrix T and thevector of the probability of the basic events on every system andthe cost of aggregating the final outcomes in a single system.Because there is no exchange of information, i.e. data passing,between GPUs during the simulations the overhead is negligible.The pseudo code of the algorithm used is shown below:

4. Results

Different simulation examples were used: to validate the entiremethod, and to check the accuracy of its results and its perfor-mance. The validation was carried out using fault tree examplesfrom different sources in the literature.

4.1. NUREG example

This example is taken from the SAPHIRE technical referencemanual Appendix A [4].

The probability of failure of the basic events are:

Pf ;1 ¼ 0:01

lo simulation.

Page 10: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

Table 3

50 F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53

Pf ;2 ¼ 0:02

Non-minimal (a) and minimal (b) cut set matrix.

(a)

Pf ;3 ¼ 0:03 Cut set n. BE1 BE2 BE3 BE4 BE5

1 1 1 0 0 0

Pf ;4 ¼ 0:04 2 1 1 1 0 03 1 1 0 1 04 1 0 0 1 05 1 0 1 1 06 1 0 0 1 07 1 0 1 0 18 1 0 1 0 19 1 0 1 1 110 1 1 1 0 111 0 1 1 0 112 0 1 1 1 113 1 0 1 1 114 0 0 1 1 115 0 0 1 1 1

(b)

Cut set n. BE1 BE2 BE3 BE4 BE5

1 1 1 0 0 02 0 0 0 0 03 0 0 0 0 04 1 0 0 1 05 0 0 0 0 06 0 0 0 0 07 1 0 1 0 18 0 0 0 0 09 0 0 0 0 010 0 0 0 0 011 0 1 1 0 112 0 0 0 0 013 0 0 0 0 014 0 0 1 1 115 0 0 0 0 0

Pf ;5 ¼ 0:05

In this example the M/N gate and the OR gate with 3 inputs isexpanded and the number of multiple occurring event increases.Table 1 shows the results of Monte Carlo simulation over a differ-ent number of trial. The time is in [ms] on a single GK107 GPU,with which GT650M graphic card is equipped. Table 2 shows thesummary of the results of the three parallel algorithms and theircomparison with the original NUREG results. Table 2a shows thenon-minimal cut sets matrix. Table 2b shows the minimal cut setsmatrix (See Table 3).

The results are equivalent to the SAPHIRE code. The computa-tion of the minimal cut sets is the most time efficient methodand give the best results for every fault tree for which all possiblecut set can be computed. If the expanded fault tree exceed theavailable global memory, i.e. GPU RAM, Monte Carlo simulationis the alternative method that takes full advantage of parallelreduction to achieve reasonable results, within 1% of the exactanswer. The bottom to top algorithm never computes good results,if even a single MOE, MOB, or an expended M/N gate is presenteven for small fault tree.

The time shows on Table 1 were recorded using the clock func-tion available in C/C++. However, while the NUREG fault tree issmall with only 5 basic event and 5 input gates the topology matrixis padded to the smallest multiple of block size used by the GPUarchitecture that contains the fault tree. 256 was the block size

Table 1Different trials run of Monte Carlo simulations.

Trial Number of Monte Carlo simulation during each trial

1000 10,000 100,000 1,000,000

Pf (log) t (ms) Pf (log) t (ms) Pf (log) t (ms) Pf (log) t (ms)

1 �6.214608192 442 �6.812445164 4324 �7.222465992 39,828 �7.271598816 480,4202 �1.#INF 410 �8.111727715 3950 �7.222465992 39,689 �7.329349995 454,1963 �1.#INF 540 �7.824046135 3920 �7.278819084 40,631 �7.290481091 409,8114 �6.214608192 520 �7.824046135 3940 �7.338538170 49,050 �7.289015770 410,0015 �6.907755375 400 �8.111727715 4140 �7.323270798 48,305 �7.264430046 412,4106 �6.214608192 400 �6.812445164 3920 �7.354042530 39,582 �7.281721592 411,8217 �6.907755375 385 �7.013115883 3880 �7.236259460 39,640 �7.253066540 411,7208 �6.214608192 395 �6.502290249 4175 �7.338538170 40,868 �7.268724918 649,4559 �6.907755375 395 �7.130898952 3925 �7.070274353 42,633 �7.347811699 826,36810 �1.#INF 380 �7.600902557 3880 �7.369790554 45,035 �7.255895138 1,080,661l �6.511671270 427 �7.374364567 4005 �7.275446510 42,526 �7.285209561 554,686r 0.343018897 54 0.55901269 144 0.086361389 3477 0.029501087 218,648

Table 2Monte Carlo simulation – accuracy comparison.

Algorithm Pf (log) Pf (linear) Difference Difference (%)

cuFTA Bottom-up algorithm �9.820150375 5.434541139E�05 6.3880E�04 92.16%Cut sets algorithm �7.257312775 7.049999549E�04 1.1852E�05 1.71%Monte Carlo (1000 sims) �6.214608192 1.999999813E�03 1.3069E�03 188.54%Monte Carlo (10,000 sims) �6.812445164 1.099999929E�03 4.0685E�04 58.70%Monte Carlo (20,000 sims) �7.264430046 7.000001238E�04 6.8521E�06 0.99%Monte Carlo (50,000 sims) �7.600902557 4.999999513E�04 1.9315E�04 27.87%Monte Carlo (100,000 sims) �7.222465992 7.300000232E�04 3.6852E�05 5.32%Monte Carlo (1,000,000 sims) �7.271598816 6.949999280E�04 1.8519E�06 0.27%Monte Carlo (10,000,000 sims) �7.284778595 6.859000813E�04 7.2479E�06 1.05%

SAPHIRE (Tab. A-5) Min Cut Upper Bound 7.04854E�04 1.1706E�05 1.69%Rare Event Approximation 7.05000E�04 1.1852E�05 1.71%Sum of 1st and 2nd order terms 6.93076E�04 7.2000E�08 0.01%Sum of 1st, 2nd, and 3rd order terms 6.93196E�04 4.8000E�08 0.01%Sum of 1st, 2nd, 4rd, and 4th order terms 6.93136E�04 1.2000E�08 0.00%Sum of all terms (exact answer) 6.93148E�04 0.0000E+00 0.00%

Page 11: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

Table 4Performance of 255 gates/256 basic events fault tree.

255 G – 256 BE Time (ms)

Trial ID GK107 GK104 GK104 �2 GK104 �4 GK104 �8 GK104 �16

1 489,125 78,500 41,656 19,352 9864 48162 504,562 80,978 40,171 19,860 10,396 52383 504,061 80,897 41,666 20,753 10,263 48324 499,288 80,131 41,027 20,917 9856 48045 478,829 76,848 38,553 21,110 9900 52836 518,098 83,150 42,284 19,230 10,116 52357 492,341 79,016 41,634 20,293 10,541 51388 506,834 81,342 39,626 20,918 10,351 52039 496,216 79,638 39,172 20,824 10,155 517510 488,750 78,440 39,405 20,861 10,427 5254

l 497,810 79,894 40,519 20,412 10,187 5098

F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53 51

for both the GK107 GPU and the GK104 GPU. Therefore, fault treesup to 20 times larger than the NUREG example fit in a topologymatrix of the same size and the solving time are similar, as shownin Table 4.

4.2. Large example

While the NUREG example is good to validate the methoddeveloped and to compare the results, in this second part a seriesof randomly generated fault trees of different size are used to showthe performance on large scale applications. All fault trees weretested on a GT650M card, which mount a GK107 GPU, and 3 work-stations with 4� Tesla K10 cards each, which have 2� GK104 GPUeach. The GK107 (Fig. 11a) and GK104 (Fig. 11b) are the same GPUwith the same architecture. The main differences are in the num-ber of streaming multi processors (SMX), the GK107 has 2 whilethe GK104 has 8, the available GPU global memory, 1 Gb vs 4 Gb,and the working frequency 835 MHz vs 735 MHz. The GT650M is

Fig. 11. GK107 (a) and GK

mounted on a laptop and it was used during the development,while the GK104 was used for testing.

From Table 4, because the 255 random gates fault tree fits inthe same topology matrix size as the NUREG example due tothe minimum padding size of 256 the solution time are the samefor both fault tree. From Table 5, the 511 random gates fault treerequires a n = 512 topology matrix and the solution time doublebecause double the block of threads are required to handle thematrix. From both table is possible to see that during Monte Carlosimulation the overhead time when using multiple GPU is negli-gible, almost 0.

It must be stated that each GPU is executing one MC simulationat a time. However, a GK104 requires 16,384 active threads toreach full capacity, which translate to: the possibility of running64 Monte Carlo simulation in parallel for fault trees with 256 basicevents or less; 32 MC simulation in parallel for fault tree between256 and 512 basic events, down to 1 simulation for fault trees withmore than 16,384 basic events. Tables 4 and 5 effectively uses only

104 (b) architecture.

Page 12: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

Table 5Performance of 512 gates/256 basic events fault tree.

511 G – 256 BE Time (ms)

Trial ID GK107 GK104 GK104 �2 GK104 �4 GK104 �8 GK104 �16

1 828,800 135,991 70,076 32,905 16,515 86412 821,200 132,375 68,127 34,036 16,651 84803 806,600 133,667 66,148 35,009 17,245 86864 807,200 130,010 63,786 34,544 17,172 82855 815,200 132,954 66,568 33,867 16,337 83816 801,400 128,061 69,185 34,992 16,526 81517 810,600 132,763 68,891 32,907 16,398 80418 805,000 135,464 66,075 33,738 17,044 86639 809,200 129,127 66,812 32,610 16,835 870010 839,820 134,259 67,310 35,001 16,658 8656

l 814,502 128,105 67,033 33,906 16,981 8289

52 F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53

6% and 12% of a GK 107 computational power, and 1.5% and 3% of aGK104 computational power.

5. Conclusions

This study presented three new algorithms to solve large faulttree with an increasing degree of complexity using a parallelGPU. The algorithms compute the probability of failure of the topevent given the probability of failure of the basic events. However,each algorithm has its own application, based on the specific prop-erties of the problem. The preprocessing algorithm that transformsthe entire tree in an equivalent tree made only of AND and OR gatewith only two inputs sacrifices global memory space but it is thegreatest contributor to the efficiency of all parallel algorithms pre-sented. This is a departure from current implementations that con-dense the fault tree as much as possible to decrease is size, whichsaves memory at the expense of the computational complexity ofeach gate.

Like any other algorithm this study is limited by the availableglobal memory, the GPU RAM, the available bandwidth, the PCIbus, and the frequency and number of cores of the GPU. However,the amount of memory and the computational power is, nowadays,enough to handle any fault tree size of practical applications. ATesla K10 or similar card has 4 Gbyte of global memory/GPU and4.5 Tflops of computational power, when single precision is used.For example, a fault tree with n = 12,288 basic events, includingMOE, and m = 128 levels can be solved. This tree is large enoughto accommodate the most complex system and it is far from thememory capability. 256 threads per block and 48 blocks is smallnumber for the computational power. In digital image processingthe number of threads for each kernel launch reaches in themillions.

Given 4 Gbyte of GPU RAM, and assuming a complex, and verylarge fault tree, with MOE and MOB. Monte Carlo simulation is thede facto and only available algorithm. Monte Carlo requires onlytwo vectors to store the state of the current row and previousrow. Every other array is negligible in size compared to the topol-ogy matrix. 4 Gbyte of GPUmemory can store 10^9 integer number(4 byte each). The topology matrix is int3 and approximately lowerdiagonal. Therefore, we can store 0.33(int3) ⁄ 0.5(lower diagonal)⁄ 10^9 gates and basic events. The current upper limit on aGK104 is with 4 Gbyte of RAM is approximately 200,000,000 gatesand basic events for a single expanded topology matrix.

Single precision float point accuracy (7 significant decimal digit)is the current limitation of our GK104. However, if the lognormalrepresentation is used, floats are good enough. Double precision(16 significant decimal digit) would be a better choice. However,

double precision computational power is still limited in GPUcomputing.

Because of its solid and simple logic rules fault tree analysis isone of the best method that can be implemented on parallel archi-tecture. The first advantage is the gate expansion from the top tothe bottom that allows the use of common reduction techniquesto build and solve the fault tree block diagram, the cut sets, andthe probability of the top events. The second advantage is theamount of memory available that allows the expansion of compli-cated gates with multiple inputs in a combination of simpler gate.The simpler and more straightforward the operations involved are,the more uniform the load balance between threads in the samewarp is, and the more efficient the computation is. We think thatthis study present a good approach to solve fault trees as a reduc-tion problem that single instruction multiple data processors(SIMD) such as GPU are good at.

The current algorithms are built for CUDA compute capability3.0 to run on GK104 GPUs. With newer version and newer architec-ture, which are evolving constantly, better performances areexpected. Furthermore, this study does not include dynamic faulttree analysis, but the topology matrix approach and Monte Carlosimulation can be used to solve such problems. Last, the topologymatrix is a lower diagonal matrix that can be stored in compactform. All of these are the focus of our future studies.

Acknowledgment

This research was supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF) fundedby the Ministry of Education (grant number 2.160173.01).

References

[1] Watson HA. Launch Control Safety Study; 1961.[2] Mearns AB. Fault tree analysis: the study of unlikely events in complex

systems. In: Boeing/UW system safety symposium; 1965.[3] Vesely B, Stamatelatos M, Caraballo J. The Aerospace Fault Tree Handbook

being developed for NASA: Overview and viewpoints on fault tree issues,Probabilistic Safety Assessment and Management, Vol I and Ii, Proceedings;2002:885–890.

[4] USNRC. Systems Analysis Programs for hand-on integrated ReliabilityEvaluations (SAPHIRE) Vol. 2 Technical Reference, NUREG/CR-6952; 2008.

[5] Kumamoto H. Satisfying safety goals by probabilistic riskassessment. London: Springer; 2007.

[6] Rasmussen NC, et al. WASH-1400-MR; NUREG-75/014-MR, TRN: 77-002146.Nuclear Regulatory Commission; 1975.

[7] Ejlali A, Ghassem Miremadi S. FPGA-based Monte Carlo simulation for faulttree analysis. Microelectron Reliab 2004;44.

[8] Bolourchi A, Masri SF, Aldraihem OJ. Studies into computational intelligenceand evolutionary approaches for model-free identification of hystereticsystems. Comput Aided Civil Infrastruct Eng 2015;30:330–46.

[9] Oh T, Popovics JS. Practical visualization of local vibration data collected overlarge concrete elements. Comput Aided Civil Infrastruct Eng 2015;30:68–81.

Page 13: Parallel fault tree analysis for accurate reliability of ...static.tongtianta.site/paper_pdf/b2213e58-5497-11e9-a1a3-00163e08bb86.pdf42 F. Sihombing, M. Torbol/Structural Safety 72

F. Sihombing, M. Torbol / Structural Safety 72 (2018) 41–53 53

[10] Adeli H. High-performance computing for large-scale analysis, optimization,and control. J Aerospace Eng 2000;13:1–10.

[11] Hung SL, Adeli H. Parallel backpropagation learning algorithms on Cray Y-Mp8/864 supercomputer. Neurocomputing 1993;5:287–302.

[12] Saleh A, Adeli H. Parallel algorithms for integrated structural controloptimization. J Aerospace Eng 1994;7:297–314.

[13] Saleh A, Adeli H. Parallel eigenvalue algorithms for large-scale control-optimization problems. J Aerospace Eng 1996;9:70–9.

[14] Lopes N, Ribeiro N. An evaluation of multiple feed-forward networks on GPUs.Int J Neural Syst 2011;21:31–47.

[15] Velez G, Matey L, Amundarain A, Suescun A, Andres Marin J, de Dios C.Modeling of shotcrete application for use in a real-time training simulator.Comput Aided Civil Infrastruct Eng 2013;28:465–80.

[16] Torbol M. Real-time frequency-domain decomposition for structural healthmonitoring using general-purpose graphic processing unit. Comput Aided CivilInfrastruct Eng 2014;29.

[17] Calado Lopes A, Sales Dias JM. Integration of geo-referenced data for visualsimulation in location-based mobile computing. Comput Aided CivilInfrastruct Eng 2006;21:514–29.

[18] Tsai Y, Huang Y. A generalized framework for parallelizing traffic signinventory of video log images using multicore processors. Comput AidedCivil Infrastruct Eng 2012;27:476–93.

[19] Carozza L, Tingdahl D, Bosche F, van Gool L. Markerless vision-basedaugmented reality for urban planning. Comput Aided Civil Infrastruct Eng2014;29:2–17.

[20] Sihombing F, Torbol M. Analytical fragility curves of a structure subject totsunami waves using smooth particle hydrodynamics. Smart Struct Syst2016;18:1145–67.

[21] Aghassi H, Aghassi F. Fault tree analysis speed-up with GPU parallelcomputing. Int J Comput Inform Syst Ind Manage Appl. 2013;5:106–14.

[22] Ericson CA. Hazard analysis techniques for system safety. Hoboken, NJ: Wiley-Interscience; 2005.

[23] Ang AHS, Tang WH. Probability concepts in engineering. second ed. Wiley;2007.

[24] Fussell JB, Vesely WE, Clement JD. Elements of fault tree construction – newapproach. Trans Am Nucl Soc 1972;15:794.