milena krasich

25
2002 Annual RELIABILITY and MAINTAINABILITY Symposium i Fault Tree Analysis in Product Reliability Improvement Milena Krasich, P.E. Milena Krasich, PE; Bose Corporation; MS 450; The Mountain, Framingham MA 01701-7330 USA e-mail: [email protected] .

Upload: piticha10fr

Post on 23-Aug-2014

94 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium i

Fault Tree Analysis in Product Reliability Improvement

Milena Krasich, P.E.

Milena Krasich, PE; Bose Corporation;

MS 450; The Mountain, Framingham MA 01701-7330 USA e-mail: [email protected].

Page 2: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium ii

SUMMARY & PURPOSE This tutorial introduces the use of a well known technique of the Fault Tree Analysis as a tool in

reliability modeling and analysis of an electronics of mechanical design (including software), identification of potential failure modes that are high contributors to unreliability, tradeoffs and mitigation of those failure modes. Applied early in product the design phase, this activity allows for relatively inexpensive and easy design and manufacturing process improvements and, in that manner, achieving considerable improvement of the product reliability before the design is completed or the product is manufactured. A real example of this analysis as applied to audio products are discussed along with the achieved reliability improvement.

Milena Krasich

Milena Krasich is the Senior Technical Lead of Reliability Engineering in Design Assurance Engineering of Bose Corporation. Before joining Bose, she was a Member of Technical Staff in the Reliability Engineering Group of General Dynamics Advanced Technology Systems formerly Lucent Technologies, and prior to that, she worked for the Jet Propulsion Laboratory in Pasadena, California. While in California, she was a part-time professor at the California State University Dominguez Hills, where she taught graduate courses in System Reliability, Advanced Reliability and Maintainability, and Statistical Process Control. At that time, she was also a part-time professor at the California State Polytechnic University, Pomona, teaching undergraduate courses in Engineering Statistics, Reliability, Environmental Testing, Production Systems Design, Measurements, and Materials Procurement. She holds a BS and MS in Electrical Engineering from the University of Belgrade, Yugoslavia, and is a California registered professional electrical engineer. She is also a member of the IEEE and ASQC Reliability Society, a Fellow and the past president of the Institute of Environmental Sciences and Technology, and a member of the College of Fellows of the Institute for Advancement of Engineering. Currently, she is a US Delegate to the International Electrotechnical Committee, IEC, working on dependability/Reliability standards and is a project leader for revision of international standards for reliability growth.

Table of Contents 1. INTRODUCTION.......................................................................................................................................... 1

1.1 Notation and Acronyms ................................................................................................................................. 1 2. Reliability Improvement................................................................................................................................. 1

2.1 Reliability Definitions Related to This Tutorial ............................................................................................. 2 3. Fault Tree Analysis and Its Use ..................................................................................................................... 2

3.1 Fault Tree – Introduction ............................................................................................................................... 2 3.2 System Analysis Methodology....................................................................................................................... 2 3.3 Building of a Fault Tree ................................................................................................................................. 6 3.4 Contribution of Manufacturing Defects ......................................................................................................... 8 3.5 Origin of Values for the Basic Events............................................................................................................ 9

4. Failure Mode Detection and Mitigation ......................................................................................................... 9 5. Summary and Conclusions........................................................................................................................... 12 6. References and Bibliography ....................................................................................................................... 12 7. Attachment -Tutorial Visuals ....................................................................................................................... 13

Page 3: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 1

1. INTRODUCTION Multiple methods have been used for the estimation of product reliability for many decades that reliability has been applied as a science. Many reasons, such as product criticality (medical devices, defense systems, transportation) or the need for competitiveness in consumer industry, dictate the need for products with remarkably high reliability.

Design alone, regardless of its features and technology, does not guarantee products reliability. A design team, conscious of good and reliable design methods such as proper component derating, ESD and EMI protection, may not be completely aware of all of the aspects of reliability modeling and potential reliability shortfalls. This is especially the case when a product must be designed to operate in multiple environments, or the specifics of component reliability aspects (such as dependency of their reliability on applied stresses) are not well understood. Therefore reliability of a completed design may not be as required or as expected.

In the past, attempts to improve product reliability were concentrated on various types of the Failure Mode and Effects Analyses (FMEA), and/or on the dedicated Reliability Growth test programs. Both of those methods applied individually or in conjunction, even though useful, may not be cost effective or applicable.

The first method, FMEA, is a valuable but a very comprehensive attempt to identify the potential failure modes and to assure their mitigation. Starting from the bottom and going up, the analysis addresses each component (electrical or mechanical), the modes in which it might fail, and the effects that those failure modes might have on higher level assemblies and the system. The process is very tedious and is often completed well after the design is finished and the production period has begun. This might be too late to accomplish any measurable improvements without major expenses for redesign, new PC boards layouts, and new tooling. In addition, any type of FMEA normally does not produce the measure of overall product reliability, thus any achieved reliability improvement is also not measurable. One type of a FMEA has a Risk Priority Number (RPN) associated with it; however, this number is a product of three numbers (from 1 to 10) assigned each, Severity, Occurrence, and Detection. Regardless of strict rules applied in estimation of these numbers, those are still only estimations, and thus might be subjective. Another FMEA type that includes criticality computation (FMECA) requires knowledge of failure rates; therefore, it cannot be applied for analysis of systems with components where failure probabilities, not failure rates are a far better attribute. Those also do not provide reliability estimates.

Test methods for reliability improvement are even more costly keeping in mind that those were performed on pre-production or production runs, meaning that the design is mature. In addition, the test units might be complex and expensive so that only a limited number might be available for testing.

Fault Tree Analysis combines many favorable aspects:

• It is timely, therefore. low cost • It is fast and easy to use • It provides realistic reliability estimates at the same time

with the failure mode analysis • It measures achieved reliability improvement and the

final reliability of a product.

1.1 NOTATION AND ACRONYMS

λ(t) - Component failure rate, instantaneous failure rate λ Component failure rate if assumed constant assumed to be constant. ESD - Electrostatic Discharge EMI - Electromagnetic Interference FTA - Fault Tree Analysis FMEA - Failure Mode and Effects Analysis FMECA - Failure Mode Effects and Criticality Analysis RPN - Risk Priority Number MTTF - Mean Time to Failure MTBF - Mean Time Between Failures IEC - International Electrotechnical Commission Q(t) - Unreliability as a function of time Q - Unreliability assumed constant or calculated for a predetermined time Pr - Probability Pr(c) - Probability of occurrence of a cut set FET - Field Effect Transistor IC - Integrated Circuit R - Reliability F- Probability of failure – unreliability CODEC - Coder/Decoder PRF - Part Random Failure PCB - Printed circuit board IEV – International Electrotechnical Vocabulary

2. RELIABILITY IMPROVEMENT Reliability improvement can be undertaken and achieved in

different phases of the product life: • Design phase • Product validation phase – test reliability growth • During its fielded life

The first option, design phase, offers the most cost effective opportunities for product reliability improvement. Before design is finalized, even considerably involved changes do not pose a great expense, other than design time. If design improvements are not excessively extensive, necessary changes can often be painlessly done. Then the rest of product preparation (such as layout of printed circuit boards, tooling, component procurement) can be done without interruption or modifications. In the design phase, reliability improvements are achieved by identification of potential design deficiencies or potential manufacturing problems/defects that may compromise reliability of a design. Some potential design flaws that are likely to be identified are as follows:

Page 4: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 2

• Electrical or mechanical overstress of components • Components inadequate to be used in that design

(unreliable or improperly used) • Potential relationship between failures, that is, secondary

failures caused by occurrence of another failure or by the presence of an environmental stress

• Parts of inferior quality (reliability) as built by their respective manufacturers.

2.1 RELIABILITY DEFINITIONS RELATED TO THIS TUTORIAL

To assure proper understanding of the terms as they are used in this tutorial, some reliability definitions are included. These are as follows:

Reliability – probability that an item can perform a required function under given conditions for a given time interval (IEV 191-12-01). Here, the required function is defined by expected performance that may vary depending on the use of the item and of the expectations. For a high-fidelity stereo audio/video product, the expectations are, for example, no audible noise or distortion. For a mechanical device, a pipe or an underwater connector housing, the expected performance would be that there is no bending greater than a predefined angle under some expected force. The measures for reliability or its complement, unreliability, would be probability of survival past the end of a predetermined period, or probability of failure before the end of a predetermined period, respectively. The measurement that is best understood by management is the percent of items surviving a time period (life or warranty).

Failure – the termination of the ability of an item to perform a required function (IEV: 191-04-01). A failure can be classified as a failure of the hardware to operate properly due to: • Design failure – a failure due to the inadequate design of

an item to withstand operational and/or environmental stresses, or due to the use of an improper part

• Manufacturing defect causing time-related failures that compromise design reliability

• Software interactions with hardware A failure of an item can also be attributed to a fault in the software code – a failure of the software design. Failure Cause – the circumstances during design, manufacture, or use which have led to a failure (IEV:191-04-01) Failure Mechanism – the physical, chemical, or other process which led to a failure. An example would be crack propagation through the dielectric of a ceramic capacitor causing the capacitor to develop a small resistance and ultimately a short circuit. Failure Mode – manner or state in which an item or a component might fail. Examples of failure modes are: • Low or no output from an IC • Separation of the IC packaging material

• Capacitor fails short due to crack propagation • Resistor fails open due to the poor welding of the

connections • FET saturates and overheats • Seal leaks, etc. One failure mode can have multiple causes. Examples of those are: IC enclosure fails due to one or more of the following:

• high humidity • high temperature • thermal cycling • IC manufacturing process

Capacitor short: • electrical overstress • high temperature, use or soldering • vehicle vibration

A seal in underwater cable connector may leak due to: • water pressure causing dilatation of the material • cold temperature • wearout from mating and de-mating of the

connector • defect in manufacturing – undersize

3. FAULT TREE ANALYSIS AND ITS USE

A fault tree is used as a Boolean representation of a product design; a system, its assemblies and functions, failure modes, and their respective causes. Fault tree analysis in analysis of a design has a multiple mission. One of its applications is for modeling of the product’s architecture and functionality in a top down manner, searching for potential failure modes and their causes that might produce an unfavorable outcome defined as a product failure. It also estimates quantitatively reliability of an item and its assemblies. Based on this information, one can identify those failure modes that are the highest contributors to the product’s unreliability, follow the investigation down to identify their respective causes. This allows for tradeoff and mitigation of those potential failure modes, and finally, evaluation of the achieved reliability improvement.

3.1 FAULT TREE – INTRODUCTION

Fault tree is a logic diagram that represents functional dependencies of parts of a system. The top gate represents the unfavorable outcome of the system, and all other unfavorable outcomes that contribute to the system failure are represented as gates, logically connected to the top gate.

Components of a fault tree are: • Gates, which are outcomes of one or a combination of

input events or other gates • Cut sets, which are groups of outcomes or events that, if

occurred, would cause a system failure. Minimal cut set contains the minimum number of events that are required for a failure outcome. The removal of one of them would result in a system surviving. Types of events and

Page 5: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 3

gates along with their definitions and graphical representations are shown in Table 3.1. Table 3.1. Graphical Representation and Definitions of Gates and Events FTA Symbol Symbol Name Description Reliability Model Inputs

BASIC EVENT Basic event for which reliability

information is available Component failure mode, or a

failure mode cause 0

CONDITIONAL

EVENT Event that is a condition of

occurrence of another event when both must occur for the output to occur

Occurrence of event that must occur for another event to occur

Conditional probability

0

DORMANT EVENT

A basic event that represents a dormant failure

Dormant component failure mode or dormant failure cause

0

UNDEVELOPED EVENT

A part of the system that yet has to be developed - defined

A contributor to the probability of failure. Structure of that system part is not yet defined

0

TRANSFER GATE

Gate indicating that this part of the system is developed in another part or page of the diagram

A partial reliability block diagram that is shown in other location of the overall system

0

OR GATE This output event occurs if any of its input event occur

Failure occurs if any of the parts of that system fails - series system

≥ 2

MAJORITY VOTE GATE

This output occurs if m of the inputs occur

Redundancy k out of n, where m = n-k+1

≥ 3

EXCLUSIVE OR The output event takes place if one, but not the other input occur

A failure of the system occurring only if one, not both of the two possible failures happens

2

AND GATE The output event takes place if all of the input events occur

Parallel redundancy, one out of n equal or different branches.

≥ 2

PRIORITY AND The output event (failure) occurs only if the input events occur in sequence from left to right

Good for representation of secondary failures or for enabling sequence of events

≥ 2

INHIBIT GATE The output occurs only if both of the input events take place, one of them conditional

Conditional probability of occurrence of the final event

2

NOT GATE The outcome is present only if the input event does not occur

Exclusive events or preventive measure does not take place

1

Page 6: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 2

3.2 SYSTEM ANALYSIS METHODOLOGY

3.2.1 Classical System Reliability Analysis

When a system is “complex” regarding the complexity of its modeling, that is, if it contains many of interlocked or common branches, standard modeling can become extensively cumbersome, lengthy, and subject to mathematical (computational) errors. An example of a simple, yet “complex” bridge circuit is shown in Figure 3.2-1.

Figure 3.2-1. Bridge Circuit

In the bridge circuit above, the signal must flow from

input A to output B. It can flow through block 3 in both directions. Analytical solution would be to model the system under two circumstances, assuming that the block 3 is good, in which case the signal would flow through blocks 1 or 2 and 4 or 5, as if they were parallel blocks, or assuming that the block 3 is bad (the condition that 3 failed), in which we have blocks 1 and 4 in series, parallel to blocks 2 and 5 also in series. This would be represented with the following equation:

[ ] ( )991.0R

R1RRRRRRRRR)RRRR()RRRR(R

s

354215241

354542121s

=

−⋅⋅⋅⋅−⋅+⋅

⋅⋅−+⋅⋅−+=

When a system contains a multitude of “complex” systems of different kinds, the algebraic representation becomes rapidly too involved and cumbersome to solve. In addition, these complex equations need to contain a multitude of conditional probabilities to account for environmental effects and secondary failures. This only adds to already extensive complexity of the calculations.

3.2.2 System Reliability Analysis Using a Fault Tree The “complex” system shown in Figure 3.2-1 can be

easily modeled using Boolean algebra with fault tree or success tree representation.

Cut sets in this system would be made of the following combinations:

• Blocks 1 and 2 (c1 = 1,2)

• Blocks 4 and 5 (c2 = 4,5) • Blocks 1, 3, and 5 (c3 = 1,3,5) • Blocks 2, 3, and 4 (c4 = 2,3,4)

Should any of the above combinations fail, the signal flow from A to be will be interrupted.

With Boolean algebra, the probability of the system failure would be:

( )4321 ccccPrFS ∪∪∪= Probability of the cut set 1 is:

( ) ( )21211 11 RRFF)cPr( −⋅−=⋅= The correct calculation (Esary-Proschan) is then:

( )[ ] [ ] [ ] [ ])cPr()cPr()cPr()cPr(

ccccPrFS

4321

4321

11111 −⋅−⋅−⋅−−=∪∪∪=

With RARE event approximation; this calculation would be:

5325315421

4321

FFFFFFFFFFF)cPr()cPr()cPr()cPr(F

S

S

⋅⋅+⋅⋅+⋅+⋅=+++=

While easy to implement, RARE approximation may introduce sizeable errors into calculations when the failure probabilities are larger numbers. Anything larger than a multiple of 10-2 as a value of a failure probability will produce an unwanted error. This is shown on the example below:

( ) ( ) ( ) ( )

34325315421

34325315421

15

24

23

22

21

10089

100689

11111

103

1052

108

105

102

⋅=

⋅⋅+⋅⋅+⋅+⋅=

⋅=

⋅⋅−⋅⋅⋅−⋅⋅−⋅⋅−−=−⋅=

⋅=

⋅=

⋅=

⋅=

.F

FFFFFFFFFFF:RARE

.F

FFFFFFFFFFF:oshanPrEsary

F

.F

F

F

F

SR

SR

S

S

Software packages commercially available for FTA are

based on Boolean algebra, and most of them contain the constant failure rate model for unavailability:

( )( )te)t(Q ⋅µ+λ−−⋅

µ+λλ

= 1

If the time to repair (MTTR) is considered infinite (non-

repairable items), then µ = 0, and: Q(t) = F(t) Other information that can be obtained with FTA

software is: • Failure frequency (hazard rate) of all gates

1 4

3

2 5

A B

Page 7: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 3

• Number of expected failures during the predetermined time

• Unavailability or probability of failure of the system at any gate

• Gate summaries in various forms • Confidence intervals • Sensitivity analysis

• Calculations using distributions other than exponential

The circuit from Figure 3.2-1 represented by FTA is shown n Figure 3.2-2

Figure 3.2–2. FTA Diagram of the Bridge Circuit

Different gates of a fault tree are used to represent different circuit models as shown in the following examples:

Example 1: Combination of series and redundant blocks (events)

Reliability block diagram of this combination is shown in Figure 3.2-3

F1 F2

F3

F3

F3

Gate 2

Gate 1

Top Gate

2 out

Figure 3.2-3 Series – Parallel Circuit Configuration

The corresponding equations are as follows: F1 = 0.002, F2 = 0.0005, F3 = 0.0032, n = 3, m = 2

( ) ( )211 111 FFFGate −⋅−−=

)in(iGate F)F(

)!in(!i!nF −⋅−⋅−⋅

= ∑ 332 1

( ) ( )[ ]3

21

10532

111−⋅=

−⋅−−=

.F

FFF

TopGate

gateGateTopGate

The FTA representation of the reliability block diagram in Figure 3.2-3 is shown in Figure 3.2-4

Page 8: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 4

Figure 3.2-4. FTA Representation of a Series-Parallel

Reliability Block Diagram With different redundant blocks (Figures 3.2-3 and 3.2.-

4) the redundant gates are different, F3, F4, and F5 instead of the repeated F3 and the calculations are done in a similar way (binomial). The three different redundant blocks are shown with the Example 2 of the conditional probability, where Gate 2 has three different gates representing the three redundant blocks (Figure 3.2-5).

Example 2: Use of a priority gate. The event F2 will occur only if the event F1 has occurred (conditional probability). The equivalent fault tree is shown in Figure 3.2–5.

Figure 3.2– 5 Example of a Priority Gate (Gate 1)

The associated mathematics is as follows: n= 3, m = 2

F1 = 0.002, F2 = 0.0005, F3 = 0.00045, F4 = 0.00053, F5 = 0.0032

211 FFFGate ⋅=

5453432 FFFFFFFGate ⋅+⋅+⋅=

( ) [ ][ ]6

21

103744

111−⋅=

−⋅−−=

.F

FFF

TopGate

GateGateTopGate

Example 3: A real life example of a priority gate is the analysis of a switching amplifier, where on all four outputs (+1, - 1, +2, and -2) there are four switching FETs, followed by noise and EMI filtering. For the FETs to operate properly (in the switching mode) the Logic Ground (LGround) must be maintained at a certain voltage. This voltage is 5V, maintained by a voltage regulator filtered by two ceramic capacitors. Should LGround voltage decrease below 2V, the FETs will start operating in linear mode and will then saturate. This condition not only constitutes a failure, but could eventually cause the FET to overheat. This voltage would decrease in the event that one of the voltage filtering capacitors developed a small resistance – close to a short. Here, the Lground below 2V is the condition for FETs to saturate and overheat.

In the old design, voltage-filtering capacitors had a dielectric with Y5V characteristics, which has a higher concentration of voids and could develop and propagate a crack easier than other ceramics (especially in harsher environments as the one that this analysis was performed for). This characteristic, along with the less than adequate voltage rating contributed to a relatively high projected probability of failure for the specified lifetime. Replacement of both of the voltage filtering capacitors with those having a dielectric with X7R characteristics and a higher voltage rating, the 10-year probability of occurrence of FET overheat was reduced from 2.0969E-3 (per FET) to 1.0009E-4, which was an improvement by a factor of 20.

The original circuit, as modeled with the fault tree is shown in Figure 3.2-6.

Page 9: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 5

I E

FET4 OVERHEATQ=2.0969e-3

Overheat ofFET due toLGND <2V

Page 1

I E

Short C905 Q=3.0001e-3

Ceramiccapacitor shorts

to ground

I E

Short_C906Q=3.0001e-3

Capacitor shortsbrings Lgroundto the ground

I E

MFG_SHORT_C905 Q=7.0000e-8

Manufactirungdefects cause

a short

PRF_SHORT_C6905

Capacitor shortsdue to part

random failureI E

Q=3.0000e-3

I E

MFG_Short_C906 Q=7.0000e-8

Manufactirungdefects cause

a short

I E

DANDREIC SHORTQ=4.6875e-13

Electrolyte mixedwith debris causing

making a short

PRF_C6906

Capaci tor failsdue to part

random fai lureI E

Q=3.0000e-3

SOLDER SHORT_C6906

Excessive soldercausing a short

between the pins orpads

I E

Q=5.0000e-8

DEBRIS_C6906

Debris on thePCB causing a

shortI E

Q=2.0000e-8

I E

LGROUND Q=5.9911e-3

LGND shorted toground causing

improper FET biasand overheat

FET 4 SATURATION

FET saturatesdue to LGND

<2VI E

Q=3.5000e-1

SOLDER SHORT_C6905

Excessive soldercausing a short

between the pins orpads

I E

Q=5.0000e-8

DEBRIS_C6905

Debris on thePCB causing a

sho rtI E

Q=2 .0000e-8

AGEING

Capacito leakingelectroly te due

to ageingI E

Q=1.0000e-6

HI-TEMP

Electroly te Leakdue to HighTemperature

I E

Q=1.2500e-7

HI_HUMIDITY

Electrolyteleak due to

high humidityI E

Q=2.0000e-6

I E

EL. CAP LEAKQ=3.1250e-6

Short caused byleaking of the nearby

capacitor

DEB RIS

resence ofdebris on the

boardI E

Q=1.5000e-7

Figure 3.2-6. Practical Example of a Priority Gate

Example 4. Use of an inhibit gate is shown in Figure 3.2-7. With the inhibit gate, for the outcome to constitute a failure, all of the input events (in our case three) must take place. A practical example of this modeling would be the connection of three EMI filtering capacitors. If a failure mode is defined as no filtering, all of the three would have to fail.

Figure 3.2-7. Example of an Inhibit Gate

Page 10: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 6

3.3 . BUILDING OF A FAULT TREE

To build a fault tree of a product (a system made of subsystems, assemblies, and components) is a top down process where, as a first step, one must define what constitutes the failure of that product. For a high quality audio amplifier, anything that the end user might hear and qualify as degraded performance constitutes the system failure.

The next step is to outline the system architecture and the major functions such as:

• Power supply • Video amplifier • Audio amplifier

The further analysis going down determines what phenomena preclude proper operability of those parts or functions, i. e:

• Shorted line voltage or no VCC supplied • No video processing • One or more audio channels not operational

More detailed analyses further determine the causes of those phenomena, contributing factors, down to the causes of failure modes such as:

• Electrical overstress • High humidity

• High temperature A detailed example of how a fault tree analysis is done is

shown in another real life example, an analog input to an analog to digital converter of an audio amplifier. The partial circuit of this amplifier is shown in Figure 3.3 – 1. This part of the amplifier is normally known as CODEC, as analog input signals are converted into a digital, and then again into linear output. The signals are directed into an IC that is an analog to digital converter.

For the amplifier to be operational, all signals have to be processed by CODEC meaning that is they have to coded and decoded. The inputs signal 1+ into the left channel of IC U20 is interrupted if:

• R200, R209, or C171 fail open • C179 shorts to ground, shorting the signal to ground The input signal 2+ into the right channel of the U20 is interrupted if: • R201, R205, or C172 fail open • C177 shorts the signal to ground The entire circuit will not work if no voltage is supplied

to the analog input, (pin 8) • R206 or R208 fail open, interrupting the supply of

2.3 V

Figure 3.3-1 Input into CODEC of an Audio Amplifier

Page 11: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 7

The signal will be too noisy if C183 fails open (low frequency noise), or C181 fails open (high frequency noise). Other contributors to the failure are the lack of data inputs, which will not be considered in this example.

The top level of the FTA representation of this analysis is shown in Figure 3.3–2.

Circled, in Figure 3.3.-2 is the gate that needs to be developed for the analog inputs 1 and 2 described earlier. Figure 3.3-3 shows further development of that gate.

Figure 3.3-2 Top Level FTA of CODEC

Figure 3.3-3 Development of the FTA for inputs 1 and 2

Page 12: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 8

Inputs 1 and 2 are then separately analyzed, and so are the noisy or no analog voltages. Development of Input 1 is shown in Figure 3.3-4. The circle points out the open components that are to be further developed. The fault tree part in Figure 3.3-4 also contains a gate that points to the possible lack of the 2.3V voltage. Capacitor C179, if failed short, would short the signal to the ground. There are two possible reasons for this capacitor to fail short. One is so called “part random failure”. This term takes into

consideration the environment that the capacitor is supposed to be exposed to (temperature, vibration) as well as the operational stresses that the capacitor will see, such as its operating voltage. Thus, the term “random failure” actually is not just a failure that will occur at random, but it describes the likelihood that a part will fail, if having an intrinsic defect, under given environmental and operational stresses.

Figure 3.3-4. Development of the FTA Down to Components and Their Failure Cause

3.4 CONTRIBUTION OF MANUFACTURING DEFECTS

Manufacturing defects causing time dependent failures are a vital contributor to product unreliability.

Some contributions to components failing open are: • Cold or insufficient solder, which after a period

of time, due to relaxation and fatigue, causes connections to open. Vibration of a vehicle will cause the cold soldered joint to open as well.

• Missing components • Components cracked during insertion • Broken or bent pins or leads

Contributors of manufacturing flaws to components failing short are:

• Debris (at times un-cleaned flux) left on the board

• Excessive solder

Page 13: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 9

• Bent pins (mostly ICs and connectors) shorting to another pin.

Another reason for the capacitor failure (Figure 3.3-4) would be a failure (a short) caused by manufacturing defects. Normally during production, if a PC board is not properly cleaned, debris left on it will produce so called dandreic growth, which, in turn might cause a short between terminals. A second manufacturing defect causing an electrical short is a result of inadequate soldering technique, where excessive solder develops a bridge between the terminals and cause a short. Further development of the fault tree will point out to other components failing open or short causing failure of the analog power supply, or interruption of the second signal.

3.5 ORIGIN OF VALUES FOR THE BASIC EVENTS

To be able to estimate the final (top gate) product reliability, each of the events must have information on its reliability assigned to it. This information may be attached in the form of a failure rate, MTTF, or probability of failure. For mixtures of hardware, mechanical and electrical, perhaps the most straight forward way would be to represent all the information in the form of a probability of failure calculated for a predetermined time, and a predetermined operational profile.

For electrical components, data for event and failure mode probabilities comes from:

• Information from the manufacturers’ life testing, which needs to be recalculated for the proper environmental and electrical stresses

• Software databases (commercially available) • Field use – field failure data information, which

would be the very last resort because of many inconsistencies of data reporting and recording.

For mechanical components, probability of failure needs to be calculated based on:

• Stresses – loads, and their geometry and distribution • Materials • Construction (design) of parts, such as shape and

size • Attachment of parts to other structures (adhesives,

fasteners) Based on all the information, the safety margin needs to

be calculated, which in turn will produce a reliability value. For determination of a probability of occurrence of

manufacturing defects, the approach may be two-fold. The probability associated with the manufacturing defects can come from factory or service data (field failure data). On the other hand, sometimes it is advisable to fill in the requirements numbers into the reliability analysis, and then adjust the manufacturing process control to achieve this goal.

4. FAILURE MODE DETECTION AND MITIGATION

In a completed or in a partially completed fault tree analysis of a system, when the probability of failure of the top level gate is calculated and it is concluded that reliability improvement is necessary, the process that follows is to identify the highest contributor to unreliability (a failure mode or a cause) and improve the design. This process continues in search for the next highest contributor. An example of such reliability improvement is shown in the case of a complex audio/video amplifier system. The top level of the system (the console) is shown in Figure 4 -1. The Tuner is shown as an event because of the repeated reference designator numbers in the bill of material of the system, and the tuner. For that reason, the Tuner was analyzed separately, and then its top unreliability is depicted as an event.

Figure 4–1 Top Level Fault Tree of Console and its Major Subsystems

Page 14: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 10

For the given warranty period, the original unreliability

value is not acceptable, as 7,365 systems out of 100,000 made would need service before the end of their respective warranty periods. The highest contributor to unreliability is the block marked SPFIF. This gate was developed on page 13, as it is shown in Figure 4-2.

Figure 4–2 SPDIF Top Level fault Tree. Looking for the highest contributor to the SPDIF circuit unreliability is shown as a part of the circuit that is an input or output from the multiplexer. Further investigation leads to the SPDIF multiplexer, where the highest contributor is the IC U501 (Figure 4-3).

The high failure probability of this IC is related to its construction – packaging (TSSOP). In another package, SOIC, this IC is a reliable part. There were 3 of these units in the console. It also was apparent that the probability of failure of capacitors C513 and C517 was too high for ceramic capacitors. This is because those had the Y5V material dielectric characteristic. There were about 116 capacitors of this type in the console.

Page 15: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 11

Figure 4-3 The Components which were the Highest Contributors to the Console Unreliability.

Once the design improvements were made, the console reliability was improved to the point of almost meeting its aggressive goal. The resultant improvement is shown in Figure 4-4.

Figure 4-4 Console Reliability Goal, Planned Growth Curve, and the Actual Reliability

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

0 50 100 150 200 250 300

Duration of the design period (days)

Con

sole

Rel

iabi

lity

Console goal R(1 year) = 0.992

Planned Reliability Growth

Achieved Reliability Growth

Y5V caps replaced by X7R (116)

Initialy calculated

TSSOPs replaced by SOICs

Transistors and FETs from a more reliable vendor

Page 16: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 12

5. SUMMARY AND CONCLUSIONS The Fault Tree Analysis can be successfully used for

identification and mitigation of potential failure modes that contribute to unreliability of a product.

The FTA allows pictorial representation of the system, its architecture and functionality, along with using Boolean algebra and the multitude of modeling schemes to best represent the system operation and interdependency of its failure modes. The FTA is here used to evaluate the individual failure mode contributions to the system unreliability and come up with the most viable solution for its reliability improvement. The methodology can be summarized as follows:

• Define what constitutes the system failure • Start with the top level of the system with an

unfavorable outcome that defines the system failure • Construct the fault tree down, using logic to express

reliability modeling techniques • Follow the analysis down the fault tree to determine

what assembly, signal, part, or manufacturing defect will cause a particular failure

• Develop the fault tree all the way down to the causes of pertinent failure modes

• Determine respective probability of occurrence of individual causes. The software, when used for analysis, will roll up all information producing the system, subsystem, and assemblies’ failure probability

• Identify those failure modes that are the highest contributors to unreliability and mitigate.

• Update the analysis, and monitor the resultant reliability improvement

Failure mode analysis with fault trees can be started with the start of a project, and updated as more detailed information becomes available.

There is no need to come up with the failure rates as a reliability measure for all components, electrical, mechanical, and software. The fault tree modeling allows a mixture of various information (failure probability, different failure distributions), and does not require estimation of failure rates only like the classical reliability predictions do.

Modeling and reliability assessment of a product – system with the fault tree analysis allows for timely design improvements while design changes are still possible, feasible and inexpensive. This methodology is also described in the draft IEC standards, IEC 60300 – 1, Dependability management. Part 3: Application guide, Section Section 1: Analysis techniques for dependability; Guide on methodology, and IEC61014, Reliability growth methods. The first standard is in its last draft for comments

and vote. The second is also a draft, in circulation for comments.

6. REFERENCES AND BIBLIOGRAPHY 1. Joanne Bechta Dugan, “Fault-Tree Analysis of Computer-Based

Systems” 1999 Tutorial Notes, Reliability and Maintainability Symposium, Washington, DC

2. Kiran Kumar Vemuri and Joanne Bechta Dugan, “Reliability Analysis of Complex Hardware-Software Systems”, Proceedings, Annual Reliability and Maintainability Symposium, January 1999, Washington, DC.

3. Géza Szabó and Péter Gáspár, “Practical treatment Methods for Adaptive Components in the Fault-Tree Analysis”, Proceedings, Annual Reliability and Maintainability Symposium, January 1999, Washington, DC.

4. Alfredo H-S. Ang and Wilson H. Tang “Probability Concepts in Engineering Planning and Design, Volume II, Decision Risk and Reliability”, 1990.

5. Milena Krasich, “Use of fault Tree Analysis for Evaluation of System Reliability Improvements in Design Phase.” Proceedings, Annual Reliability and Maintainability Symposium, January 2000, Los Angeles, California

Page 17: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 13

7. ATTACHMENT -TUTORIAL VISUALS

Fault Tree Analysis for Product – Reliability Improvement

Milena KrasichBose CorporationJanuary 23, 2002

1-23-2002 M. Krasich 2

Tutorial ContentGeneral reliability definitions in accordance with:

IEC 60050(IEV 191 191) (1990), International Electrotechnical Vocabulary, Chapter 191: Dependability and quality of service

Description of Fault Tree Analysis methodology Mathematics (statistics) associated with the Fault Tree AnalysisReliability modeling of a complex system using Fault Tree Analysis (FTA), in accordance with:

IEC 60300-3-1, Dependability Analysis Methods Examples of how the FTA is used for reliability improvement of electronics Methods for determination of failure probabilities for basic eventsFailure mode mitigation and reliability growth/improvement – a real life example

1-23-2002 M. Krasich 3

Reliability Growth - ImprovementReliability improvement of a product can be achieved in various phases of its life:

Design phaseTest, product validation phase –test reliability growthFielded life – by upgrades, derivatives, recalls, etc.

The most cost effective reliability improvement done during the product designProduct reliability improvement achieved by:

Identification of potential design flaws:Component electrical overstressPotential mechanical overstress and failureInadequate components or parts used Failure of one part caused by the failure of another partUse of parts that are of inferior quality/reliability

Identification of manufacturing problems

1-23-2002 M. Krasich 4

Reliability Definition and ConsiderationsReliability Definition (IEV 191-12-01)

Probability that an item can perform a required function under given conditions for a given time interval Required function: defined by the expected performance, i. e.

No audible noiseNo distortionNo bending pass the predetermined angle

Measures Reliability: Probability of survival after the end of a predetermined periodUnreliability: Probability of failure before the end of the period

Measure as management sees it: Percent of items surviving a predetermined time period –normally warranty period, mission period or other time period requiring proper product operation

1-23-2002 M. Krasich 5

Definition of Failure – IEV: 191-04-01The termination of the ability of an item to perform a required function

Failure of hardware to operate properly due to:Design failure: A failure due to inadequate design of an item –(to withstand operational or environmental stresses) -- improper part or improper use of part in designManufacturing defect causing time - related failures A fault due to non-conformity during manufacture to the design of an item or to specified manufacturing processesSoftware failures

Failure of software Failure Cause

The circumstances during design, manufacture, or use which have led to failure

Failure MechanismThe physical, chemical, or other process which led to a failure

1-23-2002 M. Krasich 6

Definition of Failure ModeFailure mode:

Manner or state in which an item or a component might failExamples:

Low output of an ICSeparation of the IC packaging materialCapacitor fails short due to crack propagation in the dielectric (failure mechanism)Resistor fails open, failure cause – poor lead weldingFET saturation and overheatGain changeSeal leakage

Page 18: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 14

1-23-2002 M. Krasich 7

Cause of a Failure ModeFailure or failure mode cause –

One failure mode can have multiple causesExamples:

Causes of capacitor short:electrical overstress, high temperature, vehicle vibration, high soldering temperature

Causes of a IC enclosure failure: moisture, high temperature, IC manufacturing process

Causes of a component openpoor soldering, manufacturing – breakage in insertion

Causes of a seal to leak in communication application (under water – ocean bottom)

water pressure causing dilatation, cold temperature, wearout during mating and de-mating, material degradation, manufacturing defect (under-size)

1-23-2002 M. Krasich 8

Use of a Fault TreeFault Tree Analysis (FTA), is a Boolean representation of a system and its assemblies and functions, along with failure modes and their respective causesFTA is used for a multiple mission:

For modeling the Item/system architecture and functionality with a fault tree logic diagram top down to search for potentialfailure modes that might cause an unfavorable outcome defined as a failure of the system and their respective causes To quantitatively estimate the item reliability To identify those failure modes and causes that are the highest contributor to the item probability of failureTo evaluate necessary and possible improvements – trade offTo asses the item reliability improvement as the potential failure modes are mitigated.

1-23-2002 M. Krasich 9

Fault Tree - IntroductionFault tree

A logic diagram representing functional dependencies of parts of a system, and arrangement of events causing unfavorable outcomes - system failure that correspond predetermined failure definition.

Fault tree componentsGates

Outcomes of one or a combination of input eventsCut sets

Groups of events that, if all occur, would cause a system failure.Minimal cut set: contains the minimum number of events that are required for failure. A removal of one of them would result in system not failing.

Events – Basic eventsUsually a failure cause. Gets an assigned value: failure rate, MTBF, or failure probability

1-23-2002 M. Krasich 10

Event

Basic event

Basic event for which reliability information is availableReliability model:

Component failure mode, or a failure mode cause

Conditional event

Event that is a condition of occurrence of another event when both must occur for the output to occurReliability model:

Occurrence of event that must occur for another event to occur

1-23-2002 M. Krasich 11

Events – cont.

Dormant event

A basic event that represents a dormant failure Reliability model:

Dormant component failure mode or dormant failure cause

Undeveloped event

A part of a system not yet developed

1-23-2002 M. Krasich 12

OR gateThis output event occurs if any of its input event occurReliability model: Failure occurs if any of the parts of that system fails - series system

AND gateThe output event takes place if all of the input events occurReliability model: Parallel redundancy, one out of n equal or different branches.

Majority vote gate:This output occurs if m of the inputs occur Reliability model: Redundancy k out of n, where m = n - k+1

Priority AND gate:The output event (failure) occurs only if the input events occur in sequence from left to rightReliability model: secondary failures or for enabling events

Gates

Page 19: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 15

1-23-2002 M. Krasich 17

I E

Failure Q=9.068e-3

No signal atthe output

I E

Cross 1Q=4.800e-4

Signal notgoing thourgh

the top firstI E

TopQ=1.000e-3

Signal notpassing throughthe top branch

I E

Bottom Q=7.500e-3

Signal not passignthrough the bottom

branch

I E

Cross 2Q=1.000e-4

Signal not passingthrough the bottom

block fir st

1

Block 1 fails

I E

Q=2.000e-2

2

Block 2 failure

I E

Q=5.000e-2

4

Block 4 fai ls

I E

Q=2.500e-2

5

Block 5 fai ls

I E

Q=3.000e-1

1

Block 1 fai ls

I E

Q=2.000e-2

3

Block 3 fails

I E

Q=8.000e-2

5

Block 5 fai ls

I E

Q=3.000e-1

2

Block 2 failure

I E

Q=5.000e-2

3

Block 3 fails

I E

Q=8.000e-2

4

Block 4 fails

I E

Q=2.500e-2

1 4

3

2 5

A B

FTA Model with Esary-Proschan Calculation

1-23-2002 M. Krasich 16

Comparison of the FTA Calculation Methods

4325315421sr

4325315421s

FFFFFFFFFFF

)FFF1()FFF1()FF1()FF1(1F

:tion ApproximaRare

:ns)calculatio(correct Proschan-Esary

⋅⋅+⋅⋅+⋅+⋅=

⋅⋅−⋅⋅⋅−⋅⋅−⋅⋅−−=

F1 2 10 2 F2 5 10 2 F3 8 10 2 F4 2.5 10 2

Esary-Proschan :

Fs 1 1 F1 F2 1 F4 F5 1 F1 F3 F5 1 F2 F3 F4

Fs 9.068 10 3

Rare Approximation :

Fsr F1 F2 F4 F5 F1 F3 F5 F2 F3 F4

1-23-2002 M. Krasich 15

Modeling with a Fault Tree – Boolean AlgebraBasis for the Fault Tree: Boolean algebra, used to produce minimal cut sets (or paths sets)

Cut Sets –System fails if any one of the cut set happens: c1 = 1,2 c2 = 4,5 c3 = 1,3,5 c4 = 2,3,4

FS = Pr(c1∪ c2 ∪ c3 ∪ c4) RS = 1 - FS

[ ] [ ] [ ] [ ]

4325315421S

4321S

43214321

21211

FFFFFFFFFFF)cPr()cPr()cPr()cPr(F

:ionapproximat event Rare)cPr(1)cPr(1)cPr(1)cPr(11)ccccPr(

:Proschan)(Esary ncalculatio Correct)R1()R1(FF)cPr(

⋅⋅+⋅⋅+⋅+⋅=

+++=

−⋅−⋅−⋅−−=∪∪∪

−⋅−=⋅=

1 4

3

2 5

A B

1-23-2002 M. Krasich 14

System Analysis MethodsA “complex” System Reliability Block Diagram (RBD) Example: Failure: No signal flow from A to B

Algebraic solution meaning:Reliability of the system provided that R3 is good, plus reliability of the system provided R3 is bad.

When a system is really complex, with a multitude of interrelationships between the assemblies, the algebraic solutions become rapidly too involved.Environmental factors and manufacturing errors left out.

[ ] )1()()(

354215241

354542121

RRRRRRRRRRRRRRRRRRRS

−⋅⋅⋅⋅−⋅+⋅

+⋅⋅−+⋅⋅−+=

1 4

3

2 5

A B

1-23-2002 M. Krasich 13

Gates – cont.Exclusive OR gate

The output event takes place if one, but not the other input occursReliability model: A failure of the system occurring only if one, not both of the two possible failures happens

Inhibit gate:The output occurs only if both (or all) of the input events takeplace, one of them conditionalReliability model: Conditional probability of the final event

Transfer gate:Gate indicating that this part of the system is developed in another part or page of the diagramReliability reference: A partial reliability block diagram that is shown in other location of the overall system block diagram

Page 20: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 16

1-23-2002 M. Krasich 22

Example – Partial Schematic of a Switching Amplifier

1-23-2002 M. Krasich 21

I E

TOP1Q=4.374 e-6

fails i f Gate 1OR Gate 2 fai ls

I E

GATE1Q=1.000e-6

Fails only i fEVENT1 occurs

first

I E

GATE2Q=3.374e-6

2

Fai ls if any ofthe two events

takes place

EVENT1

F1

I E

Q=0.002

EVENT2

F2

I E

Q=0.0005

EVENT3

F3

I E

Q=0.000 45

EVENT4

F4

I E

Q=0.00053

EVENT5

F5

I E

Q=0.0032

Priority Gate - ExampleF4 0.00053 F5 0.0032

FGate1 1 1 F1 1 F2

FGate2 F3 F4 F3 F5 F4 F5

FTopGate 1 1 FGate1 1 FGate2

FTopGate 2.502 10 3

Gate 1, Conditional probability:

Probability of occurrence of EVENT1 = F1

Probability of occurrence of event 2 if event 1 occurred = F2

FGate1=F(EVENT1)*F(EVENT2|EVENT1)

1-23-2002 M. Krasich 20

I E

TOP1Q=2.502 e-3

fails if Gate 1OR Gate 2 fails

I E

GATE1Q=2.499e-3

Fails i f event 1OR the event 2

takes place

I E

GATE2Q=3.374e-6

2

Fai ls i f any twoof the eventtakes place

EVENT1

F1

I E

Q=0.002

EVENT2

F2

I E

Q=0.0005

EVENT3

F3

I E

Q=0.000 45

EVENT4

F4

I E

Q=0.00053

EVENT5

F5

I E

Q=0.0032

Example: The Redundant Gates are Different

F1 F2

F3

F4

F5

Gate 2

Gate 1

Top Gate

2 out 3

n 3 m 2

F1 0.002 F2 0.0005 F3 0.00045

F4 0.00053 F5 0.0032

FGate1 1 1 F1 1 F2

FGate2 F3 F4 F3 F5 F4 F5

FTopGate 1 1 FGate1 1 FGate2

FTopGate 2.502 10 3

1-23-2002 M. Krasich 19

Example: Combination of Series and Redundant Events

F1 F2

F3

F3

F3Gate 2

Gate 1

Top Gate

2 out 3

n 3 m 2

F1 0.002 F2 0.0005 F3 0.0032

FGate1 1 1 F1 1 F2

FGate20

m 1

i

n i n i ( )

1 F3 i F3n i ( )

FTopGate 1 1 FGate1 1 FGate2

FTopGate 2.53 10 3

I E

TOP1 Q=2.530e-3

fails if Gate 1OR Gate 2 fails

I E

GATE1 Q=2.499e-3

Fails if event 1OR the event 2

occur

I E

GATE2 Q=3.072e-5

2

Fails if 2 of thethree eventstake place

EVENT1

F1

I E

Q=0.002

EVENT2

F2

I E

Q=0.0005

EVENT3

F3

I E

Q=0.0032

EVENT4

F3

I E

Q=0.0032

EVENT5

F3

I E

Q=0.0032

1-23-2002 M. Krasich 18

FTA Representation of the RBD – RARE Approximation

I E

Failure Q=9.080e-3

No signal atthe output

I E

Cross 1 Q=4.800e-4

Signal notgoing thourgh

the top firstI E

Top Q=1.000e-3

Signal notpassing throughthe top branch

I E

Bottom Q=7.500e-3

Signal not passignthrough the bottom

branch

I E

Cross 2 Q=1.000e-4

Signal not passingthrough the bottom

block fir st

1

Block 1 fails

I E

Q=2.000e-2

2

Block 2 failure

I E

Q=5.000e-2

4

Block 4 fails

I E

Q=2.500e-2

5

Block 5 fails

I E

Q=3.000e-1

1

Block 1 fails

I E

Q=2.000e-2

3

Block 3 fails

I E

Q=8.000e-2

5

Block 5 fails

I E

Q=3.000e-1

2

Block 2 failure

I E

Q=5.000e-2

3

Block 3 fails

I E

Q=8.000e-2

4

Block 4 fails

I E

Q=2.500e-2

1 4

3

2 5

A B

Page 21: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 17

1-23-2002 M. Krasich 27

Building a Fault TreeDefine the systemDefine its major parts or functions, I. e.:

Power supplyVideoAudio channels

Determine what phenomenon precludes proper operability of those parts or functions, i. e.

Shorted line voltage or no VCC suppliedNo videoOne or more audio channels not operational

Determine the causes of those phenomenaDetermine the contributing factors to the causes, i. e.

High temperatureHigh humidityElectrical overstress

1-23-2002 M. Krasich 26

Other Important Information from an FTA SoftwareFailure Frequency (hazard rate of all gates)Number of expected failures during the preset lifetimeUnavailability (or availability) of the system or any gate (function or assembly), provided the system is assumed repairableGate summary in various formsConfidence intervals on provided information (failure probability or unavailabilitySensitivity analysis – the most critical component variation in probability of occurrenceResults from failure distributions other than exponential (constant failure rate)Results calculated with multiple simulations (we normally set the number of simulations to 10,000)

1-23-2002 M. Krasich 25

I E

TOP1Q=1.001e-6

fai ls i f Gate 1OR Gate 2 fai ls

I E

GATE1Q=1.000e-6

Fails only ifEVENT1 happensbef ore EVENT2

I E

GATE2Q=7.632e-10

Fails i f al l ofthe events take

place

EVENT1

F1

I E

Q=0.002

EVENT2

F2

I E

Q=0.0005

EVENT3

F3

I E

Q=0.00045

EVENT4

F4

I E

Q=0.00053

EVENT5

F5

I E

Q=0.0032

Inhibit Gate - ExampleGate 1, Conditional

probability:

Gate 2, Inhibit:Outcome occurs only if all

three (or any number) of events – or gates – take place.

Example: Three EMI protection capacitors in parallel.

No filtering if all of the three fail openFGate2 F3 F4 F5

FGate2 7.632 10 10

1-23-2002 M. Krasich 24

I E

FET4 OVERHEAT Q=1.0009e-4

Overheat ofFET due toLGND <2V

Page 1

I E

Short C905 Q=1.4299e-4

Ceramiccapacitor shorts

to ground

I E

Short C906 Q=1.4299e-4

Capacitor shortsbrings Lgroundto the ground

I E

MFG_SHORT_C905

Q=7.0000e-8

Manufactirungdefects cause

a short

PRF_SHORT_C6905

Capacitor shortsdue to part

random failureI E

Q=1.4292e-4

I E

MFG_Short_C906Q=7.0000e-8

Manufactirungdefects cause

a short

I E

DANDREIC SHORT

Q=4.6875e-13

Electrolyte mixedwith debris causing

making a short

PRF_C6906

Capacitor fai lsdue to part

random failureI E

Q=1.4292e-4

SOLDER SHORT_C6906

Excessive soldercausing a short

between the pins orpads

I E

Q=5.0000e-8

DEBRIS_C6906

Debris on thePCB causing a

shortI E

Q=2.0000e-8

I E

LGROUND Q=2.8596e-4

LGND shorted toground causing

improper FET biasand overheat

FET 4 SATURATION

FET saturatesdue to LGND

<2VI E

Q=3.5000e-1

SOLDER SHORT_C6905

Excessive soldercausing a short

between the pins orpads

I E

Q=5.0000e-8

DEBRIS_C6905

Debris on thePCB causing a

shortI E

Q=2 .0000e-8

AGEING

Capacito leakingelectroly te due

to ageingI E

Q=1.0000e-6

HI-TEMP

Electrolyte Leakdue to HighTemperature

I E

Q=1.2500e-7

HI_HUMIDITY

Electrolyteleak due to

high humidityI E

Q=2.0000e-6

I E

EL. CAP LEAK Q=3.1250e-6

Short caused byleaking of the nearby

capacitor

DEBRIS

resence ofdebris on the

boardI E

Q=1.5000e-7

After Capacitor Improvement (0.033 µF replaced 0.1 µF)

1-23-2002 M. Krasich 23

Example of the Priority and AND Gate – Switching Amp Before Improvement

I E

FET4 OVERHEAT Q=2.0969e-3

Overheat ofFET due toLGND <2V

Page 1

I E

Short C905 Q=3.0001e-3

Ceramiccapacitor shorts

to ground

I E

Short C906 Q=3.0001e-3

Capacitor shortsbrings Lgroundto the ground

I E

MFG_SHORT_C905 Q=7.0000e-8

Manufactirungdefects cause

a short

PRF_SHORT_C6905

Capacitor shortsdue to part

random failureI E

Q=3.0000e-3

I E

MFG_Short_C906 Q=7.0000e-8

Manufactirungdefects cause

a short

I E

DANDREIC SHORT Q=4.6875e-13

Electrolyte mixedwith debris causing

making a short

PRF_C6906

Capacitor failsdue to part

random failureI E

Q=3.0000e-3

SOLDER SHORT_C6906

Excessive soldercausing a short

between the pins orpads

I E

Q=5.0000e-8

DEBRIS_C6906

Debris on thePCB causing a

shortI E

Q=2.0000e-8

I E

LGROUND Q=5.9911e-3

LGND shorted toground causing

improper FET biasand overheat

FET 4 SATURATION

FET saturatesdue to LGND

<2VI E

Q=3.5000e-1

SOLDER SHORT_C6905

Excessive soldercausing a short

between the pins orpads

I E

Q=5.0000e-8

DEBRIS_C6905

Debris on thePCB causing a

sho rtI E

Q=2 .0000e-8

AGEING

Capacito leakingelectroly te due

to ageingI E

Q=1.0000e-6

HI-TEMP

Electrolyte Leakdue to HighTemperature

I E

Q=1.2500e-7

HI_HUMIDITY

Electrolyteleak due to

high humidityI E

Q=2.0000e-6

I E

EL. CAP LEAK Q=3.1250e-6

Short caused byleaking of the nearby

capacitor

DEB RIS

resence ofdebris on the

boardI E

Q=1.5000e-7

Page 22: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 18

1-23-2002 M. Krasich 32

I E

A to D 1 and 2 Q=1.7487e-2

Failure of A to Dconversion f orchannel 1 and 2

Page 5

I E

5 V ANA Q=1.0147e-3

5V Analog notdelivered or

noisyI E

Input 1 into A to DQ=3.1481e-3

Input 1 to CODEC Ato D not available or

too noisy

Page 68

I E

Input 2 into A to DQ=4.2788e-3

Input 2 toCODEC A to Dnot available

Page 125

I E

No 5V Analog Q=7.7798e-4

5V analog notavailable

Page 44

I E

Noise on 5V ANA Q=2.3692e-4

High or lowfrequency noiseintroduced to the

signal

Page 201

I E

A_IN_1_+Q=7.2087e-3

Analog input 1to CODEC not

available

Page 71

I E

A_IN_2_+Q=8.1431e-3

Analog input 2to CODEC not

available

Page 69

I E

Fail_U20Q=1.4721e-3

U20 failure

Page 200

I E

Analog Inputs 1 & 2 Q=5.9081e-3

Analog inputs1 and/or 2 not

avai lable

FTA Representation of CODEC Analysis, cont.

•One of the plus inputs (1 or 2) not provided to the converter;

•No 5V analog supply voltage provided

•IC U20 not operational

Page 30

1-23-2002 M. Krasich 31

I E

Analog Outputs 1 and 2

Q=3.3535e-2

No analog outputf rom CODEC

available

Page 1

I E

A to D 1 and 2Q=1.7487e-2

Failure of A to Dconversion f orchannel 1 and 2

Page 30

I E

Digital f rom U20Q=1.4648e-2

One or more digitaloutputs from U20 not

available

Page 29

I E

D to A for A_OUT_1&2+

Q=1.5972e-2

D to A conversion foranalog outputs 1 and

2

I E

A_OUT_1 and 2Q=5.5314e-4

Analog outputsnot available

I E

D input to U21Q=1.4648e-2

No digital inputprovided for

the U21

Page 27

E

DAC_1_DATQ=0.0000

No dataavailable from

CAD_1I E

Fail_U21 Q=3.2979e-4

U21 failure

Page 198

I E

5 V ANA to U21 Q=1.0147e-3

5V Analog notdelivered or

noisy

Page 43

I E

A_OUT_1Q=2.7661e-4

Analog output1 not available

Page 66

I E

A_OUT_2Q=2.7661e-4

Analog output2 not available

Page 65

FTA Representation of CODEC AnalysisFailure: No analog output from CODEC, one of the

reasons: no analog inputs into it – 1 or 2

Go to page 30 for the

analog inputs

Page 5

1-23-2002 M. Krasich 30

Rationale for Analysis of A to D ConversionInput Circuit

The entire circuit will not work if:No voltage supplied to the analog input (pin 8): Open R206, or R208 (if open – slight non-audible distortion) orshort C174 or C176 (if any of the caps open, no failure)

No 5V analog supplied to pin 7: C 181 or C 183 fail shortU20 fails in whichever mode (low, high, or no output)

There will be no output to the D to A conversion and the rest of the amp if failed open: R214, R215, R218, and R 219 (if shorted – not too much harm)Not all failure modes need to be considered if not important to the failure definition– realistic prediction

1-23-2002 M. Krasich 29

Rationale for Analysis of A to D Input Circuit

For the amplifier to be operational, all signals have to be processed by CODEC – coded and decodedIn CODEC, the analog signal is converted to digital, and then again into analog for the analog outputThe input signal 1+ into the left channel of IC U20 interrupted if:

Components fail open: R200, R209, C171

C179 shorts to ground (shorting the signal)The input signal 2+ into the right channel of IC U20 interrupted if:

Components fail open: R201, R205, C172

C177 shorts to ground (shorting the signal)Opening of C117 might cause some noise, that will be filtered later in the circuit

1-23-2002 M. Krasich 28

Example – Input to CODEC of an Amplifier

Page 23: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 19

1-23-2002 M. Krasich 37

Contribution of Manufacturing DefectsContribution to components failing open

Cold or insufficient solder: Connection opens over time due to the solder fatigue or vibrations

Missing componentsAmazingly large number of components are not inserted during assembly – detected later when the function exercised

Components cracked during insertionBroken or bent pins or leads

Contribution to failing shortDebris (un-cleaned flux) left on the board that with dandreic growth causes a shortExcessive solderBent pins (mostly ICs and connectors) shorting with another pin

1-23-2002 M. Krasich 36

I E

Noise on 5V ANAQ=2.3692e-4

High or lowfrequency noiseintroduced to the

signalPage 30

I E

MFG_Open_El_C183Q=1.3000e-8

Capacitor connectionsopen due to the

manufacturing def ec t

PRF_Open_El_C183

Capacitor failsopen due to the

part randomfailure

I E

Q=6.18377e-005

Cold solder_El_C183

Connection opensdue to insufficient

or inpropersoldering

I E

Q=1.2e-008

Missing_El_C183

Part notinserted during

assembly

I E

Q=1e-009

I E

MFG_Open_C181 Q=1.3000e-8

Capac itor connectionsopen due to the

manufacturing defect

PRF_Open_C181

Capacitor failsopen due to the

part randomfailure

I E

Q=0.000175069

Cold solder_C181

Connection opensdue to insufficient

or inpropersoldering

I E

Q=1.2e-008

Missing_C181

Part notinserted during

assembly

I E

Q=1e-009

I E

Open_El_C183Q=6.1851e-5

Open capac itorcauses low

frequency noise

I E

Open_C181 Q=1.7508e-4

Open capacitorcauses high

frequency noise

High or Low Frequency Noise into the CODEC

1-23-2002 M. Krasich 35

I E

No 5V Analog Q=7.7798e-4

5V analog notavailable

Page 30

I E

MFG_Short_C181 Q=7.0000e-8

Connection shortdue to the

manufacturingdefect

PRF_Short_C181

Capacitor failsshort due to the

part randomfailure

I E

Q=0.000274094

Debris_C181

Debris on thePCB causing

dandreic growthand a short

I E

Q=2e-008

Solder_short_C181

Excessive soldercausing a short

between the pinsor pads

I E

Q=5e-008

I E

MFG_Short_El_C183Q=7.0000e-8

Connection shortdue to the

manufacturingdefect

PRF_Short_El_C183

Capacitor failsshort due to the

part randomfailure

I E

Q=9.36354e-005

PRF_Leak_El_C183

Electrolyte leakdue to thecapacitor

random failureI E

Q=1.76564e-005

Debris_El_C183

Debris on thePCB causing

dandreic growthand a short

I E

Q=2e-008

Solder_short_El_C183

Excessive soldercausing a short

between the pinsor pads

I E

Q=5e-008

I E

Short_EL_C183Q=1.1136e-4

The 5V analogshorts to ground,

no voltage forpin 7 of U20

I E

Short_C181 Q=2.7416e-4

Capacitor failsshort, shorting+5V analog to

the groundI E

+5V_ANAQ=3.9264e-4

Voltage notavailable

Page 67

Failure Due to No Analog Voltage

Supply

1-23-2002 M. Krasich 34

I E

Open Comp Q=4.1504e-4

Open componentsinterrupting the

signal or causignnoise

Page 68

I E

MFG_Open_El_C171 Q=1.3000e-8

Capacitorconnections open dueto the manuf acturing

def ect

PRF_Open_El_C171

Capacitor failsopen due to the

part random failure

I E

Q=0.000127767

Cold solder_El_C171

Connection opensdue to insufficient

or inpropersoldering

I E

Q=1.2e-008

Missing_El_C171

Part notinserted during

assemblyI E

Q=1e-009

I E

MFG_Open_R200 Q=1.3000e-8

Resistor connectionsopen due to the

manuf acturing def ect

PRF_Open_R200

Resistor fails opendue to the partrandom failure

I E

Q=5.16358e-005

Cold solder_R200

Connection opensdue to insufficient

or inpropersoldering

I E

Q=1.2e-008

Missing_R200

Part notinserted during

assemblyI E

Q=1e-009

I E

MFG_Open_R209 Q=1.3000e-8

Resistor connectionsopen due to the

manuf acturing def ect

PRF_Open_R209

Resistor fails opendue to the partrandom failure

I E

Q=5.16358e-005

Cold solder_R209

Connection opensdue to insufficient

or inpropersoldering

I E

Q=1.2e-008

Missing_R209

Part notinserted during

assemblyI E

Q=1e-009

I E

MFG_Open_R206 Q=1.3000e-8

Resistor connectionsopen due to the

manuf acturing def ect

PRF_Open_R206

Resistor fails opendue to the partrandom failure

I E

Q=5.16358e-005

Cold solder_R206

Connection opensdue to insufficient

or inpropersoldering

I E

Q=1.2e-008

Missing_R206

Part notinserted during

assemblyI E

Q=1e-009

I E

MFG_Open_C179 Q=1.3000e-8

Capacitorconnections open dueto the manuf acturing

defect

PRF_Open_C179

Capacitor failsopen due to the

part random failure

I E

Q=0.000132368

Cold solder_C179

Connection opensdue to insufficient

or inpropersoldering

I E

Q=1.2e-008

Missing_C179

Part notinserted during

assemblyI E

Q=1e-009

I E

Open_C179 Q=1.3238e-4

Open capacitorcauses high

frequency noise onthe input

I E

Open_R206 Q=5.1649e-5

Resistor fails open,+2.3 V not

available for theanalog input

I E

Open_R209 Q=5.1649e-5

Resistor failsopen, signalinterrupted

I E

Open_El_C171 Q=1.2778e-4

Open capacitor interrupts the

signalI E

Open_R200 Q=5.1649e-5

Resistor failsopen, signalinterrupted

Signal Noisy or Interrupted Due to Open Components

Page 140

1-23-2002 M. Krasich 33

I E

Input 1 into A to D Q=3.1481e-3

Input 1 toCODEC A to Dnot available or

too noisyPage 30

I E

Short_C179 Q=2.0730e-4

Capacitor failsshort, shortingsignal 1 to the

groundI E

2.3V Q=2.6595e-3

2.3 V supply

Page 126

I E

Open Comp Q=4.1504e-4

Open componentsinterrupting the

signal or causignnoise

Page 140

I E

MFG_Short_C179 Q=7.0000e-8

Connection shortdue to the

manufacturingdefect

PRF_Short_C179

Capacitor failsshort due to the

part randomfailure

I E

Q=0.000207226

Debris_C179

Debris on thePCB causing

dandreic growthand a short

I E

Q=2e-008

Solder_short_C179

Excessive soldercausing a short

between the pinsor pads

I E

Q=5e-008

Input 1 Not Available

•PRF – Failure of the part –“random”

•Failure probabilities are assigned to the manufacturing process – quality requirement

Page 68

Page 24: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 20

1-23-2002 M. Krasich 42

FTA Top Level – Audio/Video Console exampleStart from the system top level –Include only those failure modes that affect the system performanceRepresent system architecture – functional, hardware, or mixWhen work completed, look for the highest contributor to unreliability

Console

I E

Postman System Q=7.365e-2

System failureor improperoperation

I E

ANALOG SIGNALQ=1.162e-2

Analog signalnot available

Page 2

I E

Power SupplyQ=3.280e-3

No or improperpower deliveredto the system

Page 12

I E

Video Q=3.221e-3

No video

Page 8

I E

SPDIF Q=4.946e-2

No SPDIFbotth zones

Page 13

I E

FunctionsQ=3.464e-2

Failure of thesefunctions causes

noticeabledifference

Page 11

Tuner

Tuner failure

I E

Q=4.423e-3

1-23-2002 M. Krasich 41

Probability of the Seal Wear The wear or spiral fracture of the Parker Fluorocarbon seals is noticed when the squeeze was 0.017 per side –failure definition for a 0.210” cross sectionAbrasion resistance of Fluorocarbon is determined (Parker Handbook) to be good with the properly determined seal compression (squeeze)Radius of the above seal is found from:

Ratios of the one sided compression and the respective radiuses are:

The probability of the actual seal failure in ten years of life is:

. 2 0.21 0.2585

( ) ( )6

22

21

21 10464.11.03.0

)10( −⋅=

⋅+⋅

−Φ=

rrrryearsF

.r ;rρ

=0040017.0

21

1-23-2002 M. Krasich 40

Example of Failure Probability CalculationsAutomotive amplifierLife expectancy: 15 yearsAverage active time (ON) daily: 2.7 hoursAssumptions:

Car stereo ON when driving – automotive or Ground Mobile (GM) environmentCar stereo OFF while car parked – stationary thermally uncontrolled environment (GF) – dormancy appliesComponent probability of failure can be calculated as:

0.1factor dormancy d where dGFGFD

)7.224(15365ONt

7.215365ONt

)OFFtGFDexp()ONtGMexp(1)years15(F

≤=⋅λ=λ

−⋅⋅=

⋅⋅=

⋅λ−⋅⋅λ−−=

1-23-2002 M. Krasich 39

Part of the Failure Mode Probability Worksheetpn desc ref rem fr

Failure mode ratio Failure rate Dormant FR R(Ta) F0 F1

191470-332 CAP,0603,X7R,50V,3300PF C540 PRF_C540 0.0089 8.937E-09 8.937E-10 0.999922 7.8285E-05 7.8285E-06PRF_Short_C540 0.75 6.7028E-09 6.7028E-10 0.999941 5.8714E-05 5.8714E-06PRF_ChValue_C540 0.1 8.937E-10 8.937E-11 0.999992 7.8288E-06 7.8288E-07PRF_Open_C540 0.15 1.3406E-09 1.3406E-10 0.999988 1.1743E-05 1.1743E-06

191470-473 CAP,0603,X7R,50V,.047UF C541 PRF_C541 0.0114 1.1351E-08 1.1351E-09 0.999901 9.943E-05 9.943E-06PRF_Short_C541 0.75 8.5133E-09 8.5133E-10 0.999925 7.4573E-05 7.4573E-06PRF_ChValue_C541 0.1 1.1351E-09 1.1351E-10 0.99999 9.9434E-06 9.9434E-07PRF_Open_C541 0.15 1.7027E-09 1.7027E-10 0.999985 1.4915E-05 1.4915E-06

254110 DIODE,SCHOTTKY,40V,3A,S D803 PRF_D803 0.01 9.95E-09 9.95E-10 0.999913 8.7158E-05 8.7158E-06PRF_Short_D803 0.2 1.99E-09 1.99E-10 0.999983 1.7432E-05 1.7432E-06PRF_Open_D803 0.45 8.955E-10 8.955E-11 0.999992 7.8445E-06 7.8445E-07PRF_ParamCh_D803 0.35 6.965E-10 6.965E-11 0.999994 6.1013E-06 6.1013E-07

135247-5232DIODE,ZEN,5.6V,225MW,5% D306 PRF_D306 0.003 3E-09 3E-10 0.999974 2.628E-05 2.628E-06PRF_Short_D306 0.2 6E-10 6E-11 0.999995 5.256E-06 5.256E-07PRF_Open_D306 0.45 2.7E-10 2.7E-11 0.999998 2.3652E-06 2.3652E-07PRF_ParamCh_D306 0.35 2.1E-10 2.1E-11 0.999998 1.8396E-06 1.8396E-07

147239 DIODE,DUAL,SOT-23,BAW56 D206 PRF_D206 0.0101 1.0146E-08 1.0146E-09 0.999911 8.8875E-05 8.8875E-06PRF_Short_D206 0.51 5.1745E-09 5.1745E-10 0.999955 4.5327E-05 4.5327E-06PRF_Open_D206 0.29 1.5006E-09 1.5006E-10 0.999987 1.3145E-05 1.3145E-06PRF_ParamCh_D206 0.2 1.0349E-09 1.0349E-10 0.999991 9.0656E-06 9.0656E-07

147239 DIODE,DUAL,SOT-23,BAW56 D707 PRF_D707 0.0101 1.0146E-08 1.0146E-09 0.999911 8.8875E-05 8.8875E-06PRF_Short_D707 0.51 5.1745E-09 5.1745E-10 0.999955 4.5327E-05 4.5327E-06PRF_Open_D707 0.29 1.5006E-09 1.5006E-10 0.999987 1.3145E-05 1.3145E-06PRF_ParamCh_D707 0.2 1.0349E-09 1.0349E-10 0.999991 9.0656E-06 9.0656E-07

147239 DIODE,SWITCHING,75V,200 D702 PRF_D702 0.0172 1.72E-08 1.72E-09 0.999849 0.00015066 1.5066E-05PRF_Short_D702 0.92 1.5824E-08 1.5824E-09 0.999861 0.00013861 1.3861E-05PRF_Open_D702 0.08 1.2659E-09 1.2659E-10 0.999989 1.1089E-05 1.1089E-06

147239 DIODE,SOT-23,BAV 99 D100 PRF_D100 0.0101 1.0146E-08 1.0146E-09 0.999911 8.8875E-05 8.8875E-06PRF_Short_D100 0.51 5.1745E-09 5.1745E-10 0.999955 4.5327E-05 4.5327E-06PRF_Open_D100 0.29 1.5006E-09 1.5006E-10 0.999987 1.3145E-05 1.3145E-06PRF_ParamCh_D100 0.2 1.0349E-09 1.0349E-10 0.999991 9.0656E-06 9.0656E-07

147239 DIODE,SOT-23,BAV 99 D101 PRF_D101 0.0101 1.0146E-08 1.0146E-09 0.999911 8.8875E-05 8.8875E-06PRF_Short_D101 0.51 5.1745E-09 5.1745E-10 0.999955 4.5327E-05 4.5327E-06PRF_Open_D101 0.29 1.5006E-09 1.5006E-10 0.999987 1.3145E-05 1.3145E-06

1-23-2002 M. Krasich 38

Values for the Basic EventsElectrical components

Information from manufacturers (life test data)Need to be adjusted for the proper environment and stresses

Software databasesField use (last resort)

Mechanical componentsDetermine stresses - loads (mechanical, environmental)Construct stress/strength equation for multiple loads if requiredCalculate design (safety) margin and reliability (probability of failure) for the required life

Manufacturing defectsFactory dataField failure data

Page 25: Milena Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium 21

1-23-2002 M. Krasich 47

The Benefit of FTA for the Design Reliability Growth

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1.02

0 50 100 150 200 250 300Design Time (Days)

Rel

iabi

lty

System

Subwoofer

Console

If 100,000 systems produced in one year, 9,250 less will be returned for repair within

warranty period as a result of reliability improvement

1-23-2002 M. Krasich 46

Fault Tree Analysis for Reliability Growth - SummaryDefine what constitutes a system failureStart with the unfavorable outcome that defines the system failureConstruct the fault tree down, using logic to express reliability modeling techniquesFollow the analysis: failure of what assembly, signal, or part will cause the particular failure.Develop down to the causes of the pertinent failure modesDetermine probabilities of occurrence of individual causes.Identify the highest unreliability contributor or safety relatedfailure modes and mitigateImprove reliability as necessary and possibleUpdate the analysis, monitor reliability until the goal is met

1-23-2002 M. Krasich 45

Audio/Video Console Reliability Growth Monitoring

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

0 50 100 150 200 250 300

Duration of the design period (days)

Cons

ole

Relia

bilit

y

Console goal R(1 year) = 0.992

Planned Reliability Growth

Achieved Reliability Growth

Y5V caps replaced by X7R (116)

Initialy calculated

TSSOPs replaced by SOICs

Transistors and FETs from a more reliable vendor

1-23-2002 M. Krasich 44

Detailed Failure Modes and Causes Cause 1: TSSOPS

Cause 2 Caps with Y5V

dielectric

1-23-2002 M. Krasich 43

The Highest Contributor to Unreliability - Example

• Follow the highest hitter down to its subassemblies

• Look for the highest contributor to its reliability

Page 13