Reliability and risk analysis data base development: an historical perspective

Download Reliability and risk analysis data base development: an historical perspective

Post on 26-Jun-2016

213 views

Category:

Documents

2 download

TRANSCRIPT

  • ELSEVIER 0951-8320(95)00110-7

    Reliability Engineering and System Safety 51 (1996) 125-136 1996 Elsevier Science Limited

    Printed in Northern Ireland. All rights reserved 0951-8320/96/$15.00

    Reliability and risk analysis data base development: an historical perspective

    Joseph R. Fragola Science Applications International Corporation, 7 West 36th Street New York, NY 10018, USA

    Collection of empirical data and data base development for use in the prediction of the probability of future events has a long history. Dating back at least to the 17th century, safe passage events and mortality events were collected and analyzed to uncover prospective underlying classes and as- sociated class attributes. Tabulations of these developed classes and associated attributes formed the underwriting basis for the fledgling insurance industry.

    Much earlier, master masons and architects used design rules of thumb to capture the experience of the ages and thereby produce structures of incredible longevity and reliability (Antona, E., Fragola, J. & Galvagni, R. Risk based decision analysis in design. Fourth SRA Europe Conference Proceedings, Rome, Italy, 18-20 October 1993). These rules served so well in producing robust designs that it was not until almost the 19th century that the analysis (Charlton, T.M., A History Of Theory Of Structures In The 19th Century, Cambridge University Press, Cambridge, UK, 1982) of masonry voussoir arches, begun by Galileo some two centuries earlier (Galilei, G. Discorsi e dimostrazioni mathematiche intorno a due nuove science, (Dis- courses and mathematical demonstrations concerning two new sciences, Leiden, The Netherlands, 1638), was placed on a sound scientific basis. Still, with the introduction of new materials (such as wrought iron and steel) and the lack of theoretical knowledge and computational facilities, approximate methods of structural design abounded well into the second half of the 20th century. To this day structural designers account for material variations and gaps in theoretical knowledge by employing factors of safety (Benvenuto, E., An Introduction to the History of Structural Mechanics, Part H: Vaulted Structures and Elastic Systems, Springer-Verlag, NY, 1991) or codes of practice (ASME Boiler and Pressure Vessel Code, ASME, New York) originally developed in the 19th century (Antona, E., Fragola, J. & Galvagni, R. Risk based decision analysis in design. Fourth SRA Europe Conference Proceedings, Rome, Italy, 18-20 October 1993). These factors, although they continue to be heuristically based, attempt to account for uncertainties in the design environment (e.g., the load spectra) and residual materials defects (Fragola, J.R. et al., Investigation of the risk implications of space shuttle solid rocket booster chamber pressure excursions. SAIC Document No. SAIC/NY 95-01- 10, New York, NY).

    Although the approaches may appear different, at least at first glance, the intention in both the insurance and design arenas was to establish an 'infrastructure of confidence' to enable rational decision making for future endeavours. Maturity in the design process of conventional structures such as bridges, buildings, boilers, and highways has led to the loss of recognition of the role that robustness plays in these designs to qualify them against their normal failure environment. So routinely do we expect these designs to survive that we tend to think of the individual failures (which do occur on occasion) as isolated 'freak' accidents. Attempts to uncover potential underlying classes and document associated attributes are rare, and even when they are undertaken 'human error' or 'one-of-a-kind accidents' is often cited as the major cause which somehow seems to absolve the analyst from the responsibility of further data collection (Levy, M. & Salvadori, M., Why Buildings Fall Down, W.W. Norton and Co., New York, NY, 1992; Pecht, M., Nash, F.R., & Long, J.H., Understanding and solving the real reliability assurance problems. 1995 Proceedings of Annual RAMS Symposium, IEEE, New York, NY, 1995).

    The confusion has proliferated to the point where legitimate calls for scepticism regarding the scant data resources available (Evans, R.A., Bayes paradox. IEEE Trans. Reliab., R-31 (1982) 321) have given way to cries that

    125

  • 126 J. R. Fragola

    some data sources be abandoned altogether (Cushing, M. et al., Comparison of electronics-reliability assessment approaches. Trans. Reliab., 42 (1993) 542- 546; Watson, G.F., MIL Reliability: a new approach. IEEE Spectrum, 29 (1992) 46-49). Authors who have suggested that the concept of generic data collection be abolished in favor of a physics-of-failure approach (Watson, G.F., MIL Reliability: a new approach. IEEE Spectrum, 29 (1992) 46-49) now seem to be suggesting that the concept of 'failure rate' be banished altogether and with it the concept of reliability prediction (Pecht., M. & Nash, F., Predicting the reliability of electronic equipment. Proc. IEEE, 82 (1994) 992-1004).

    There can be no doubt that abuses of generic data exist and that the physics-of-failure approach has merit, especially in design development, however, does the situation really justify the abandonment of the collection, analysis, and classification of empirical failure data and the elimination of reliability or risk prediction? If not, can the concepts of 'failure rate' and 'prediction' be redefined so as to allow for meaningful support to be provided to logical decision making?

    This paper reviews both the logical and historical context within which reliability and risk data bases have been developed so as to generate an understanding of the motivations for and the assumptions underlying their development. Further, an attempt is made to clarify what appears to be fundamental confusion in the field of reliability and risk analysis. With these clarifications in hand, a restructuring of the conceptual basis for reliability data base development and reliability predictions is suggested, and some hopeful recent developments are reported upon.

    1 INTRODUCTION

    Published sources of reliability and risk data have a long history dating back to at least the era of sailing ships. These sources were generated from collected historical data, grouped according to classes, and the associated class attributes were established and used by farsighted individuals to amass considerable fortunes by insuring against loss of life or financial loss. These individuals succeeded whether the losses ensured against were 'acts of God' (such as the weather or accidental death) or 'acts of man' (such as an overreaching ship captain or an individual having chosen coal mining as a profession) to the extent that they were able to establish prototypical classes within the historical data set and to apply those classes to forecast the occurrence frequency of future events.

    Without competitors they could, to some degree at least, make up for the inadequacy of their data set by increased premiums. However, if the premiums began to represent a substantial portion of the cost of the loss they risked loss of market due to self-insurance (or alternatively no insurance). Even though insurance premiums in new enterprises are not driven by class distinctions initially (as remains the case today in the commercial space launch industry), soon the more wily competitors discover that certain sea captains, or certain routes, or certain classes of ships (or launch vehicles in the current era), or certain cargoes are more or less likely to encounter difficulties. They can then take care to adjust their rates accordingly,

    thereby either expanding their market or reducing their exposure as the case may be.

    One of the first designs for which reliability and risk data bases may have been developed was that of the automobile. The analysis of the reliability and risk of the automobile began when it was converted from a plaything of the rich to an alternative method of transport for a broad spectrum of the population. While frequent breakdowns and adjustments might add to the amusement of a ride in the country to those living a life of leisure, they were decidedly unattractive to an individual whose daily bread depended upon reliable transport. This change in customer base led to the gathering of maintenance statistics and attempts at redesigning or replacing or providing backups for frequently failing items (e.g., some cars included handcranks as backups to electric starters as late as the 1950's, and as late as the 1970's most new cars included a full sized spare tire well after the development of steel belted radial tires).

    For aircraft the situation was similar. No one much cared if an individual pilot risked his or her life (aside from spectator interest), so single engine aircraft abounded with individual engine performance as the technology driver. Even the first successful transatlan- tic aircraft (Lindberg's Spirit of St. Louis) was single engine. Despite these endurance successes, the US Civil Aeronautics Board (CAB) (the predecessor to the Federal Aviation Administration {FAA}) was so concerned with engine failure in civil transport aircraft that it required designs to have more than two engines

  • Reliability and risk analysis data base development: an historical perspective 127

    before they could be certified and licensed for passenger use. In fact, one of the first uses of reliability data in the public arena may well have been the justification of the certification of the DC-3 for passenger service with only two engines. This justification was provided on the basis of higher engine reliability and not just by increased robustness in engine performance.

    2 EARLY FA ILURE RATE DATA SOURCES

    Published sources of failure rates for components have been available since the 1950's. Shooman 13 indicates that large industrial organizations such as the Radio Corporation of America (RCA), General Electric (GE), and Motorola published handbooks of part failure rate data compiled from life-test and field-failure data. Whether any of these original sources are extant could not be determined even though the data contained apparently was eventually indicated in MIL-HDBK-21714 when it was first published in 1962. However, it is known that in March of 1953 the Radio-Electronic-Television Manu- facturer's Association (RETMA), the predecessor to the Electronic Industries Association (EIA) establ- ished a committee on electronic applications. The committee was formed to, among other items, establish methods and procedures for gathering reliability data and for analyzing, tabulating, and publishing results. In the 1950's after its establishment, the committee published a series of bulletins which reported on the results of the reliability data gathering efforts. An example of the information published and the format used, taken from Volume 4, No. 1 of these Electronics Applications Reliability Review bulletins, 15 is given in Tables 1 and 2. The exact publication date of the first bulletin could not be determined, but it appears likely that it appeared sometime in 1957 so that these bulletins may represent the earliest extant data source.

    The tables given indicate clearly the population of components, the number of failed components and the associated population hours as well as the failure rate ratio for each part listed. Also provided is an estimate of the hourly index considered representative for part groupings and the index adjusted to the 90% confidence level assuming the failure and component hours for the group. The tables indicate a trend which unfortunately continues in some generic reliability data bases even to the present day, namely, ignoring the tolerance* uncertainty in favor of the statistical

    * Throughout this paper, 'tolerance' refers to the uncertainty arising from the physical and environmental differences among subcomponent samples when failure rate data are aggregated to produce a generic component class data value.

    uncertainty associated with establishing the index. The problem is exemplified by the case of subminiature tubes, where the observed failure rate per hour varies from a high of 1/660 for 6112 tubes (excluding the zero failure cases) to a low of 1/7400 for 5840 tubes. This represents a difference greater than a factor of 10 in the underlying subpopulation while there is only a factor of 1.2 difference between the population hourly index and the 90% confidence level. If the adjusted index were used for 6112 tubes the 112340 would be considerably optimistic since three failures had been observed in less than 2000 population hours.

    The first broad based published source of reliability information may well have been the Martin Titan Handbook. 16 This widely distributed source, officially designated 'Procedure and Data for Estimating Reliability and Maintainability', was published on 9 July 1959 and contained generic failure rates on a wide range of electrical, electronic, electromechanical, and mechanical 'parts or assemblies'. Although this important data source expanded the component types considerably and kept the distinctions between electronic components for the most part, it only listed a 'Generic Failure Rate' (GFr) for each. Lost was the population of components included in the test sample, the number of failures observed, and the population hours (if this information ever existed). Also, no estimate of confidence variation or adjustment was provided. Tolerance variation can be established across the types given and for subminiature tubes the variation is about a factor of 6 which appears somewhat consistent with the EIA results. However, correlation between tube names in this source and tube numerical designation in Table 1 and Table 2 was not available so an accurate comparison was not possible. The Titan Handbook was the first known source to standardize the presentation of failure rates in terms of failures per 10 6 hours eliminating the necessity for conversions, but it also formally introduced the unfortunate use of failure rate

    Table 1. Reliability indices at the 90% confidence level

    Component Part Failures + Adjusted Component Index

    Hours

    Relays 21/21700 1/740 Crystal diodes 18/57,500 1/2,340 Subminiature tubes 30/86,500 1/2,340 Miniature tubes 6/36,500 1/3,420 Potentiometers 10/76,500 114,950 Connectors and plugs 9/100,000 116,850 Transformers 3/47,000 1/7,050 Capacitors 13/545,000 1/28,700 Inductors and coils 0/90,500 1/39,400 Resistors 151990,000 1/46,000 Solder connections 88]6,600,000 1/66,000

  • 128 J. R. Fragola

    Table 2. Hourly reliability indices for missile electronic parts

    Components Failures + Failure rate Hourly Per hour Population index, p'

    Relays 21/1,090 1/990 1/990 Delay lines 3/168 1/1,070 1/1,070 Rotating equipment: Motor 1/158 1/3000 Inverters 2/100 1/950 1/1,080 Dynamotors 1/58 1/1,000 Rate gyros 4/142 1/685 Subminiature tubes: 5639 3/254 1/1,610 5643 0/17 0/325 5702 1/147 1/2,800 5718 8/771 1/1,820 5719 2/236 1/2,240 5783 0/149 0/2,830 1/2,900 5784 0/135 0/2,570 5840 2/780 1/7,400 5896 0/37 0/700 5902 1/135 1/2,570 6021 10/1,685 1/3,200 6112 3/104 1/660 Crystal diodes Silicon 14/878 1/1,200 Selenium 3/400 1/2,530 1/3,180 Germanium 1 / 1,735 1/33,000 Microswitches 2/400 1/3,800 1/3,8000 Miniature tubes: 5673 0/21 0/400 5726 1/695 1/13,200 5727 0/16 0/305 5751 1/424 1/8,000 1/6,100 5814 2/484 1/4,600 6005 2/264 1/2,500 VC 1258 0/16 0/305 Potentiometers Linear plastic 2/152 1/1,440 Wire wound 3475 1/3,000 1/7,650 Composition 5/3,395 1/12,900 Connectors and plugs 9/5,234 1/11,000 1/11,000 Transformers 3/2,476 1/15,700 1/15,700 Inductances 0/1,213 0/23,000 1/23,000" Capacitors: Paper 10/15,677 1/30,000 Ceramic 0/6,428 1/122,000 Mica and glass 1/5,841 1/110,000 1/42,000 Tantalytic 2/678 1/6,500 Tube sockets 1/2,944 1/56,000 1/56,000 Resistors: Composition 5/33,519 1/127,000 Deposited carbon 7/16,911 1/46,000 1/66,000 Wire wound 3/1,787 1/11,300 R-f coils 0/3,542 0/67,500 1/67,500" Solder joints and 88/349,700 1/75,000 1/75,000

    wires

    The occurrence of one failure has been taken arbitrarily until more data becomes available.

    adjustment factors or 'K-factors'. These factors were to be simply multiplied by the Generic Failure Rate to adjust for the presence of redundancy (Kr). and to account for operational mode (Kop). It also presented

    no structure to the definition of the items listed, placing on a co-equal basis pieceparts such as capacitors and major assemblies such as turbines, even though the assembly might well contain any number of piece-parts listed elsewhere in the tables.

    Although the Titan Handbook did not provide failure rates for different counterpart failure modes it did recognize the need to consider failure modes and provided an appendix of typical failure modes for the components listed. It also standardized on an ad hoc basis, the use of the exponential distribution in calculations.

    3 REL IABIL ITY DATA, THE SECOND GENERATION

    The Titan Handbook set the stage for more ambitious programs to collect and organize reliability data. These efforts were spurred on by requirements generated in the various military services in the US for improved reliability for both military aircraft and missiles and launch vehicles 2 in particular. All of these efforts were initiated in some form in the 1960's and produced published reliability data for various spectra of components. The notable efforts to be discussed here were:

    1. MIL-Handbook-217 2. Failure Rate Data Bank (FARADA) w 3. RADC Non-Electronic Reliability Notebook 's

    All of these second generation sources built upon the experience of the EIA Bulletins and the Titan Handbook expanding upon this first generation work as indicated below and surviving in some form to the present day.

    3.1 MIL-handbook-217

    The effort which led to the publication of 217 in 1962 was ongoing when the first generation handbooks were published. 217 assumed the ad hoc standards of the Titan Handbook, namely listing base failure rates per 10 6 hours. Using the constant hazard exponential model, and employing K-factors (which eventually became Jr factors) to indicate variations from the base

    2 A recently declassified report 44 indicates just how bad the state of affairs was for US missiles and launch vehicles at the time. The report, issued just one month prior to the historic announcement of the decision to go to the moon by US President J.F. Kennedy on May 25, 1961, indicated that US ballistic missiles had been only 70% successful and only 50% of US spacecraft had achieved successful orbits.

  • Reliability and risk analysis data base development: an historical perspective 129

    generic constant rate. Components were grouped into broad general categories with information for the subcategories derived through the use of correction factors. The amount of information contained in various revisions of 217 is enormous compared to that of the original Titan Handbook, but unfortunately, along with the expansion of the wisdom of this early work, the errors were also compounded. Specifically, tolerance variations became masked within the factors. Also, the failure rates came to be looked upon as fixed measures of specific equipment, not general measures of a spectrum of equipment types. This viewpoint led to the unfortunate use of 217 in government procurement specifications. And worse still, acceptable performance was demonstrated by a prediction using 217 data!

    discussed earlier). On a positive note, this data base clearly indicated the problem of using statistical confidence alone to establish uncertainty bounds because the more inhomogeneous the collection became in the published tables, the tighter the uncertainty bounds due to the associated increase in the population of the aggregate. Thus the estimated mean 'failure rate' of the mixed population was better 'known' in the statistical sense, (because of the larger population of failures) but was less representative of the subpopulations constituting the mixture. This error, initially made with the EIA data discussed above, was proliferated throughout the US industrial base by GIDEP.

    3.3 RADC non-electronic reliability notebook

    3.2 Failure rate data bank (FARADA)

    The commanders of the US Army Material Com- mand, the Air Force Logistics Command, and the Air Force Systems Command jointly sponsored a program which encouraged the exchange of data on Army and Air Force purchased equipment. The program eventually became known as the Government/ Industry Data Exchange Program or GIDEP. By the 1970's this program had grown to include over 400 participants, 80% of whom were industrial organiza- tions, the rest being government laboratory or repair facilities. The Data Bank, which eventually became known as the Reliability Maintainability Data Bank, included both failure rate and replacement rate data. The 'Summaries of Failure Rate Data' published on a regular basis included failure rate information, equipment population information, and failure mode information when known. The data included in the Data Bank were collected from field experience, laboratory accelerated life tests, and reliability demonstration tests.

    G IDEP was the first data source to pioneer a supporting data base management software system. This allowed the data bank to be quickly updated and the published data to be provided in a variety of formats. It also allowed the data to be analyzed statistically according to a generic data structure. Unfortunately, the easy statistical analysis led to the development of Z 2 confidence bounds 3 for aggregated populations which were obviously inhomogeneous. They thereby ignored the tolerance uncertainty which often drives the data uncertainties (as has been

    3 In 1953, Epstein 19'2 showed that confidence bounds could be established on estimates of a mean value of the failure rate using the Z 2 distribution with 2K degrees of freedom and twice the accumulated test time provided that it could be assumed that the underlying distribution for the life of the devices was exponential and the population could be considered homogeneous.

    The US Air Force Rome Air Development center or RADC (located in Rome, New York) was responsible for the development of this notebook. By the 1970's the notebook contained failure rate data on over 300 types of components and parts. These data were derived from military field operating experience and test experience. Some of the failure rates were derived through synthesis of similar generic part types with failure rate groupings made for those of the type which had been subjected to similar environment. In this way, environmental application information (K-factors) was derived from specific experience not generically across a broad range of experience as with 217. The same problem of the use of X 2 bounds to represent the complete uncertainty in the estimates was present here as well. However, the problem was minimized for this data because the aggregates tended to be formed across very specific populations which were much more homogenous than those contained in GIDEP.

    This data source was later joined by another RADC sponsored project conducted by Illinois Institute of Technology (IIT) related to microcircuits, which resulted in the publication of the IIT Reliability 'Notebook for Microcircuits'. In this notebook, failure rate information on digital monolithic microcircuits was provided as point estimates with 60% bounds. The data source was the first known to formally recognize the problem of merging inhomogeneous data sets and used the Fisher F test to determine homogeneity prior to merging. The data was sorted by gross application environment (e.g., spacecraft fixed, ground benign), by 217 quality level, complexity interval (number of gates), and junction temperature. The data tables were separated according to the source of information. Thus separate field, test, check-out, Reliability Demonstration, and Vendor data tables were presented.

    These two sources represented the start of a series of reliability data handbooks which continue to be

  • 130 J. R. Fragola

    published by the Reliability Analysis Center at RADC. The RAC handbooks, although they lack failure modes, which are important distinctions for many applications, represented then, as they still do today, a significant generic failure rate data resource.

    4 THIRD GENERATION DATA SOURCESp THE ERA OF THE 70's AND 80's

    By the beginning of the 1970's the patterns for good and for ill had been set in reliability data bases. The use of the constant hazard assumption was universal. Data were presented as point estimate 'failure rates' according to broad generic types. Little or no attention was paid to uncertainty and when it was, the confidence bounds were derived and applied assuming that the populations were homogeneous even when they obviously were not. In some instances, published sources indicated absurdly tight bounds surrounding point estimates which had been obtained by blindly grouping together data from devices which were obviously different. Some data sources, in particular 217, chose to ignore these significant problems and continued to expand their data sets to include even more specific categories of device types. These failure rates were established by combining ever more complicated ~ factors with a listed generic failure rate. By the 1980's, 217 had gone through B,C, and D editions and was reincarnated as 217E. Now it was a formidable data source containing several hundred pages of tables and charts. So formidable was it that the military began to believe in it as almost a 'sacred text', which was even being specified for use in procurement activities.

    Despite this state of affairs, analysts began to see the usefulness of data sources which were carefully developed and adequately applied. They also began to seek improvements in the design of new data bases that addressed problems which had been identified. 21 Range estimates were introduced about estimated mean values to address the problem of heterogeneous subpopulations. These estimates attempted to gauge the actual uncertainty (both tolerance and confidence) of the underlying mixture distributions through various percentile grouping approaches, thereby preserving the dispersion in the data and reducing the possibilities for misuse. Failure rate estimates were separated into time related (hourly) and demand related (cyclic) categories. Efforts ongoing in this era 22"23 attempted to expand upon the pioneering work of Yurhowsky (of Hughes Aircraft for RADC) 24 to categorize component rates according to pre- dominant historical modes of failure separated according to whether the modes were time or demand related. This work also extended the 'loss of all function' or catastrophic failure category for failure

    rates to include degraded and incipient failures. The former category described cases where some loss of functional capability occurred but the device was still performing at above the minimally acceptable level and the latter addressed cases where no loss of function had occurred but where there were 'indications' that a loss of function would occur without maintenance or repair. This separation avoided the confusion which abounded concerning the distinction between 'repair rates' (i.e., rates for any condition which resulted in either repair or replace- ment of a device) and 'failure rates' (i.e., rates of catastrophic loss of function of a device occurring in service or when demanded). While 217 and others remained dedicated to the military and space industry and delved ever more deeply into microcircuitry, these newer efforts moved into the commercial nuclear power industry and, later, to the offshore oil and chemical industriesY "26

    One pioneering data base report 22 had the luxury of being designed from the ground up. Because of this, the authors were able to build on experience and eliminate the mistakes of the past. They also could ask themselves some fundamental questions concerning the need the data base was to be designed to fulfill, its corresponding objectives, and the technical approach to be taken to meet these objectives. Contrary to some earlier attempts, this effort saw the process of collecting and encoding reliability data as one of 'assignment'. That is, the objective was to identify attributes which could be meaningfully established to distinguish devices in their failure context and to develop correspondences between these attributes and others of the same or different classes. Although the assignment process can be performed in many ways, the alternatives can be grouped according to either variable assignment approaches or specific assignment or selection approaches. At some level, specific assignments must be made or the process of data collection cannot proceed. Specific assignment is made, for example, by the statement, 'we want to collect reliability data on valves'. This statement implies two individual statements: 1. valves are of concern to us, 2. therefore, we want to collect data on them. The first statement identifies 'valves' as the component of interest thus assigning that class as the one of interest out of the general class of components. The statement presupposes that the questioner understands not only the attributes of the class 'valves' but also which attributes should be grouped to which others to allow the general class to be constructed. But if the questioner really knew this then the question would have to have been answered before it was asked, so clearly there must be some misin- terpretation. What the questioner meant to imply is that because he considered that his understanding of the relationship between valve attributes and valve

  • Reliability and risk analysis data base development: an historical perspective 131

    reliability was deficient he wished to gather informa- tion so as to lessen this deficiency. In this case, the simple statement becomes, 'Because the reliability of valves is of concern to us, and because we realize our understanding of the relationship between valve attributes and valve reliability is unclear, we want to collect data which will clarify our understanding.'

    This latter statement clearly indicates that the objective of a data collection project is to reduce the uncertainty that the user has in the application of the data by developing classes of attributes which allow ranges of expected reliability measures to be associated with those classes. This discussion might appear somewhat esoteric, but it is believed that it is precisely this misunderstanding of the purpose of data collection and what the collected data represents which has caused so many problems and has led some to call for the abandonment of these projects altogether.

    To understand better why this is so, a further discussion of the assignment process is required. Application of the assignment process to valves requires first a statement of valve attributes. There are many such sets and many ways they could be constructed, but for the sake of brevity assume that the following attributes are important for clarifying our understanding of valve reliability.

    1. Type of Application (Functional use or service) 1. Standby Emergency (infrequent service) 2. Operational Mode Use (on-off service) 3. Throttling Use (continuous service)

    2. Type of Application (internal Environment) 1. Gas (Steam or Air) 2. Liquid (Water or Hydraulic Fluid)

    3. Type of Environment (External) 1. Heat Trace 2. High Radiation

    4. Size

    5. Generic type 1. Globe 2. Gate 3. Check 4. Diaphragm 5. Butterfly

    6. Operator Type 1. Manual 2. Electric Motor 3. Air 4. Hydraulic 5. Solenoid

    In establishing these attributes the process may not appear to differ from other approaches such as that

    used in 217. However, the essential difference occurs at the second step, that is, with the development of the correspondences between the attributes. Second generation data bases leave the specification of the correspondences to the user who is often ignorant of the process of data collection and is unlikely to know which correspondences are likely to be fruitful and which are not. There is no prior preference set for any size, type, function, fluid or environment combination. For this reason, the specification of the variables in these efforts is left outside of the encoding process as is the sequence which should be followed in making the variable assignment (i.e., whether size is more important than fluid or vice versa).

    In contrast to this traditional approach the Third Generation Data Bases take a hierarchical approach. The objective of this approach is to establish preferences so that they will facilitate the develop- ment of a clearer understanding of the relationship between valve attribute and valve reliability. The approach taken is consistent with:

    'Our facility for making instances out of classes and classes out of instances [which] lies at the basis of our intelligence, and is one of the great differences between human thought and the thought processes of other animals. '27

    In the case of Third Generation Data Bases every attempt is made to imbue the encoding scheme with the greatest level of intelligence which can be provided by the data base developer before the fact. Data bases of this type attempt not to leave the decision for preference and variable specification to the user in any instance where he feels he has insufficient knowledge of these attributes. However, it also allows the user to restructure the scheme when and where he feels his knowledge base provides justification for so doing. If the data base meets its objective it refers the 'fuzzy' idea of a particular valve to the set of attributes which the user has in mind and thereby constructs a scheme whereby the user can take advantage of the fact that his particular valve concept will often inherit many of the properties of the class to which it belongs.

    To understand the value of this scheme and how it relates to the value of a generic data base constructed in this way, consider the following. Suppose you are told that a problem has just occurred with a valve in a particular plant, or on a particular aircraft. In this case you will begin immediately to construct a fresh new mental image or model for the particular valve. But if you are presented with no further information the new image will be heavily dependent upon your precon- ceived notion of the general class for 'valve'. Unconsciously, you will be forced to rely on a myriad of presuppositions which are derived from your own experience base concerning valves. If your experience is with commercial plants you might imagine that they

  • 132 J. R. Fragola

    are 72 inches or smaller, that they are made of metal, that they operate between -20C and 300C, etc.

    On the other hand if your experience is completely in the aircraft or aerospace industry the concept of a 72 inch valve may stretch the bounds of credibility (as does the idea of a 17 inch quick disconnect valve on the space shuttle external tank). Whatever these ideas are they are built into the mental image of the class as 'expected' or 'predisposed '28 links to other images, and unless they are overridden they remain prespecified or 'default' options.

    The data encoding scheme presented in these data bases specifies as many default options as possible for the user. Any or all of them may be overridden, but if they are not they will remain in the 'instance symbol' as inherited from its 'class symbol'. Additionally, and most importantly, until these options are overridden they provide a preliminary basis for the user to think about the specific instance of concern to him by making use of the 'reasonable' guesses which are supplied by the 'stereotype' or class symbol. So in the case of the E IA data we can say that a reasonable range of failure rates for subminiature tubes would be between 1/600 and 1/7000 per hour if we know only that the specific instance is some member of the general class of subminiature tubes.

    A further benefit of this approach is that it is self correcting in that if we find the range of uncertainty provided for the general class to be unacceptable we know how to be more specific to narrow the range. So for example, if we specified the 5700 series of subminiature tubes we might be able to narrow the range to between 1/1800 and 1/3000 failures per hour, and so on. Note that when we invoke an instance symbol it need not even be one that is in existence or is of a specific type unknown to us or to the data base (it may be a new, as yet undesigned, 5700 series tube). This does not matter. We are still led to expect that if its attributes place it in the general 5700 series class then we would expect (barring any further informa- tion) that its failure rate would fall within the range.

    5 NOW JUST WHAT IS A FA ILURE RATE?

    This new concept of a reliability data base challenges classical ideas in probability and statistics. In classical statistics reliability parameters are generally con- sidered as fixed unknown quantities and estimates of these fixed quantities (i.e., the failure rate point estimate in a particular data base) as well as their ranges are considered as random variables. In the non-classical sense (e.g. Bayesian statistics) reliability parameters are considered as random variables but even here many would consider the 'true' value of any measure as fixed and unknown. Well then just what is the random variable? It is not the parameter itself,

    (e.g., the failure rate) it is our belief in that parameter or our 'certainty' in the parameter. The problem is that we use the same word and namely 'failure rate' for both (this was pointed out very clearly by R.A. Evans in Ref. 9 from which the above has been paraphrased). Evans has suggested that in this latter instance we should change the name to 'failure rate belief' and not use the term failure rate. Then we could say that we had a 90% degree of belief that a 5700 series subminiature tube will have a 'failure rate belief' that lies between 1/1800 and 1/3000. This would certainly clarify the issue but it appears on the surface too counterintuitive and therefore has not, and probably never will, taken hold. Further, it has been suggested that in a very real sense 12 at least in some instances the classical parameter failure rate is devoid of any useful meaning and should be abandoned. To understand why this has occurred and much of the furore that is ongoing within the reliability and risk assessment technical community concerning the usefulness of generic data basis requires some further discussion.

    6 PHYSICS-OF-FAILURE YES! GENERIC DATA NO!

    In the decade of the 90's it has become fashionable to fault generic sources of reliability in general, and MIL-HDBK-217 in particular. These criticisms began with the airing of legitimate concerns that 217 was being used by the US Department of Defense to estimate the reliability of procured products and to indicate compliance with specified reliability goals. It was argued, quite reasonably, that such an approach forced manufacturers to lower junction temperatures and to use expensive hermetically sealed packages (rather than much cheaper plastic encapsulated packages) just to meet the handbook rules. Critics became more and more numerous and outspoken, ever-widening in their criticisms. One technical magazine dedicated an entire feature article to air the criticisms. 11 What had started as a reasoned call to use the physics-of-failure as part of a stress-margin approach toward the development of new robust designs degraded into an instant demand that generic data sources be abandoned. One critic from the UK in particular stated: 'I think that the only balanced view is to say that MIL-HDBK-217 and anything like it is the biggest load of garbage ever to be foisted by engineers on other engineers and should immediately be done away with'. 1~

    Those who were suggesting the physics-of-failure approach as an alternative also legitimately pointed out that generic data was of limited use in troubleshooting an existing design, and in the determination of the root cause of failure and the

  • Reliability and risk analysis data base development: an historical perspective 133

    appropriate corrective action. While this is certainly true, the problem remains that it is difficult to show that the physics-of-failure approach is any better than 217 in making reliability predictions. When one author attempted to do so, 1 he ended up showing that 217 predictions (without considering the tolerance uncer- tainty) were inaccurate compared with recorded MTBFs tests. However, if the data in the paper from 217 are considered measures of the mean of a failure rate belief and a reasonable uncertainty range is applied to that mean considering a typical distribution (e.g., an error factor of 5 applied to a lognormal distribution) then the MIL-HDBK-217 data turn out to be an excellent predictor in 3 out of the 9 test cases and a fair predictor in two out of the remaining 6 cases. No indication was provided that this reasonable performance could be bettered by the alternate approach. Further, applying 217 in this case identifies Vendor A,B, and E cases as clear non-performing outliers. Additionally, the 217 predictions indicated that Vendor A's design was likely to be a weak one and would have provided Vendor A at least with a recognition of the potential need for a redesign.

    Another problem associated with abandoning generic data is perhaps greater. That is, where would the abandonment of generic data sources leave us? What would replace them? If the physics-of-failure is the chosen replacement then a recent paper by even one of the biggest critics of 21712 indicates additional dangers. Experience has shown that the physics-of- failure, if it is carefully applied to devices which are already quite reliable, all but eliminates any and all further failures that would be detectable using the approach. In this way, what the physics-of-failure approach does is to rapidly mature the design. If microcircuit engineers then are dealing predominantly with mature designs, (much in the same way that architects and civil engineers who construct roadways, tunnel, bridges, and buildings) 15 then they no longer have to worry about 'normal' failures (which are the only ones the physics-of-failure can work for). They also are faced with characterizing all the residual failures as being unique or 'freak' failures. In this sense, the authors are correct when they suggest that in such an environment the classical concept of 'failure rate' loses its meaning. But does that mean that there is no value in gathering together what information is available on these freak failures to provide us with a degree of belief that a given design will not succumb to one? It appears possible that even in this case reliability prediction (a better term is forecasting) is useful and a set of generic data would be helpful. But in this case perhaps the data will provide the equivalent of a 'safety factor' for these devices to indicate that the residual failure rate, although perhaps very uncertain, is certainly very low.

    7 WHERE DO WE GO FROM HERE?

    From the above discussion certain observations can be made concerning reliability and risk data bases:

    1. It should be the responsibility of designers to attempt to produce robust mature designs which will operate failure free throughout their life.

    2. The physics-of-failure approach as part of an integrated design development process can provide the stress margin necessary for a design to be considered robust and therefore can be an invaluable tool to designers.

    3. The physics-of-failure approach can also be an invaluable diagnostic tool in trouble shooting an existing design and discovering the proper corrective action.

    4. Classical failure rate data sources are of very limited value in forecasting reliability of devices when the data is volatile 29 (such as for microelectronics). However, even in this ap- plication they may be useful for providing a credible expectation range of reliability perfor- mance provided the uncertainties associated with their application are properly taken into account.

    5. For some components such as electromechanical, mechanical, and electrical components the reliability data may not be so volatile, but even here data must be applied by properly considering and accounting for the uncertainties associated with manufacturing differences, ope- rational use, and operating environment.

    6. In all cases, whether it be in estimating the residual 'freak' failure rate in mature designs, determining the potential failure rate that might be expected for a new design, or even applying a failure rate gathered from significant empirical evidence on the operation of an existing design, the failure rate represents a central measure of a distribution which spans a space of credibility. It is never just a point value and the range is never determined by the statistical confidence estim- ated from the data underlying the population alone. Tolerance, that reflects the range of expected subpopulations which are created by design, manufacture, or use attributes, must also be considered and may often be the driver in the establishment of the credible range of forecasted reliability performance.

    7. While reliability forecasting (if done properly) has been, and might be expected to continue to be, a useful tool to support design decision making, the analyst must be forever vigilant to prevent it from becoming 'an accounting exercise'. 8 Also, the analyst must always re- member that a forecast is just that, a statement

  • 134 J. R. Fragola

    of the credible range of expected performance. It is not a 'prediction'. It is never a statement of what particular performance will occur for a particular device. Prediction in this sense is the province of prophets not reliability analysts)

    8 SUMMARY AND CONCLUSIONS

    The preceding sections have attempted to provide an historical viewpoint of the development of published reliability and risk analysis data bases. They have attempted to review, from the author's perspective, the contributions made and errors introduced by these early efforts, and how these have been amplified through the intervening decades. A discussion has also been presented of legitimate criticisms developed concerning the structure and use (or abuse) of published data sets. Indication has also been provided of how, in the opinion of the author, the critics may have already gone too far and, because of what is believed to be a fundamental misunderstanding on their part, may be leading reliability and risk disciples on a path toward extinction. Finally, some of the attributes of a rational reliability and risk forecasting technology are presented along with the characteris- tics to be expected of improved reliability and risk data bases.

    It is the belief of the author that if efforts are undertaken along the correct path, reliability and risk technologists can expect ever more useful data bases to be developed. Some indications are already extant. In the area of improved insights one report 31 indicated early on how the demand related and time related failure modes are affected by the type of service and the in situ service conditions. This insight has been borne out by a soon to be published report 32 which indicates almost an order of magnitude difference in failure rate for identical devices (MOVs) in frequent use high pressure control service vs frequent lower pressure on/off service. Service type distinctions were pioneered in 1984 in the first edition of the OREDA handbook :5 along with distinctions made for internal operating fluids (distinctions in internal environments have been expanded by another recent work). 26 This handbook, now in its 1992 edition, 33 also pioneered the use of component boundary drawings to reduce the misuse of the contained data. For the case of emergency diesel generators (EDGs), so important for standby electric service, several comprehensive data bases have been made available) 4"35 The latest one, published in 1994, provides US nuclear industry-wide data from 1988 to 1991 on 195 EDGs at 63 plant sites. Failure rates and uncertainty bounds are provided for both the start (demand or cyclic) failure mode and load-run (time related) failure mode across this population.

    Advances have also been made in the aerospace industry in the area of launch vehicle performance data base development. One noteworthy effort is the AIAA International Reference Guide, 36 which con- tains detailed information on vehicles launched from the 1960's to 1990 in the United States, Europe, China, India, Israel, Japan and the former Soviet Union. While significant and useful details are given on the schematics and design characteristics of each vehicle, perhaps the most important feature from the failure data perspective is the inclusion of complete vehicle-specific launch histories, where launches which ended in failure are noted with a brief failure cause description (e.g., stage 2 attitude control).

    This work has been expanded for US space vehicles launched at Cape Canaveral through a detailed investigation of launch performance records obtained from the 45th Space Wing range safety office at Patrick Air Force Base) 7'38 Among the vehicles represented in these records are Delta, Atlas, Redstone, Apollo/Saturn, and Jupiter/Juno. Each record cites the launch date, site, vehicle configura- tion, trajectory and flight plan, and test/flight objectives and results. As a result of the review and categorization of data from these records, liquid and solid fueled launch vehicle flight histories were characterized in terms of the number of catastrophic, degraded, and incipient failures out of total launches and were used to construct data distributions to reflect the statistical uncertainty.

    The information resulting from these efforts has been computer coded and included in a data base workstation currently in use at the NASA Headquar- ters Safety and Mission Assurance offices. 39 The hierarchical organization of the data permits the user to select from among launch vehicle data categorized by country, then by vehicle name, then configuration (e.g., Titan III). User-selected flight failure data can then be statistically combined through an aggregation module and the resulting lognormal distribution can be plotted.

    For more volatile data sets, such as those addressing microelectronic devices, one report 4 has indicated that despite the data volatility, the use of reasonable assumptions concerning the failure rate credibility range across the technology base can lead to predictions that match (within a reasonable range of certainty) the actual performance of a variety of actual spacecraft histories. Further trends in these data sets have been shown to be useful to decision makers in providing them with the information necessary for developing reasonable expectations of the improve- ment in spacecraft reliability which might ensue over the next decade due to currently available improve- ments in the technology base.

    These efforts and other inquiries 4~ or developments 42"43 have convinced the author that the

  • Reliability and risk analysis data base development: an historical perspective 135

    future role of reliability forecasting as a useful tool in decision making is assured provided that support for these pioneering efforts is continued and that some of the more classic data bases (such as MIL-HDBK-217) take heed of the lessons learned in their development. If these pioneering efforts die out and if classic data bases persist in their impossible quest to document strictly classical reliability measures for specific devices, then reliability and risk data bases will become useless artifacts and the critics will be totally justified in calling for their abolishment and the abandonment of reliability forecasting based upon the information they contain.

    ACKNOWLEDGEMENT

    The author would like to thank E.P. Collins of the SAIC New York Office for her many suggestions on improvements to this manuscript and D. Walton, also of SAIC New York, for his invaluable assistance in the production of the document.

    REFERENCES

    1. Antona, E., Fragola, J. & Galvagni, R. Risk based decision analysis in design. Fourth SRA Europe Conference Proceedings, Rome, Italy, 18-20 October 1993.

    2. Charlton, T.M., A History Of Theory Of Structures In The 19th Century, Cambridge University Press, Cambridge, UK, 1982.

    3. Galilei, G. Discorsi e dimostrazioni mathematiche intorno a due nuove science, (Discourses and mathe- matical demonstrations concerning two new sciences), Leiden, The Netherlands, 1638.

    4. Benvenuto, E., An Introduction to the History of Structural Mechanics, Part H: Vaulted Structures and Elastic Systems, Springer-Verlag, NY, USA, 1991.

    5. ASME Boiler and Pressure Vessel Code, ASME, New York.

    6. Fragola, J.R., Frank, M.V. Karms, J.J., Maggio, G. & McFadden, R., Investigation of the risk implications of space shuttle solid rocket booster chamber pressure excursions. SAIC Document No. SAIC/NY 95-01-10, New York, NY.

    7. Levy, M. & Salvadori, M., Why Buildings Fall Down, W.W. Norton and Co., New York, NY, 1992.

    8. Pecht, M., Nash, F.R., & Long, J.H., Understanding and solving the real reliability assurance problems. 1995 Proceedings of Annual RAMS Symposium, IEEE, New York, NY, 1995.

    9. Evans, R.A., Bayes paradox. IEEE Trans. Reliab., R-31 (1982) 321.

    10. Cushing, M., Martin, D., Stadterman, A. & Malhotra, A., Comparison of electronics-reliability assessment approaches. Trans. Reliab., 42 (1993) 542-546.

    11. Watson, G.F., MIL Reliability: a new approach. IEEE Spectrum, 29 (1992) 46-49.

    12. Pecht., M. & Nash, F., Predicting the reliability of electronic equipment. Proc. IEEE, 82 (1994) 992-1004.

    13. Shooman, M.L., Probabilisitic Reliability: An Engineer- ing Approach, 2nd edition, Krieger, Malubar, FL, 1990.

    14. Reliability prediction of electronic equipment. MIL- HDBK-217E, Department of Defence, Washington DC, 1982.

    15. Calabro, S.R., Reliability Principles and Practices, McGraw-Hill, New York, NY, 1962.

    16. Procedure and data for estimating reliability and maintainability. Report No. M-M-P-59-21, Martin Co., Denver, 1959.

    17. Summaries of failure rate data. GIDEP Operations Center, Corona, CA.

    18. Cottrell, D.F. et al., RADC nonelectronic reliability notebook. RADC-TR-69-458, Rome, NY, 1969.

    19. Epstein, B., cited in Miller, I. & Freund, J., Probability and Statistics for Engineers, Prentice-Hall, Inc., Englewood Cliffs, N J, 1965.

    20. Epstein, B., Tests for the validity of the assumption that the underlying distribution of life is exponential. Technometrics , (1960).

    21. Fragola, J.R. & Hecht, L.O., Reliability data bases: a review. Proceedings of the Product Liability Conference, PLP-77E, 1EEE, New York, 1977.

    22. IEEE Std. 500-1977, IEEE Guide to the Collection and Presentation of Electrical, Electronic, and Sensing Reliability Data for Nuclear Power Generating Stations, IEEE, New York, NY, 1977.

    23. Nuclear Plant Reliability Data System, Institute of Nuclear Power Operations, II Circle Parkway, Atlanta, GA, 1976.

    24. Yurkowsky, et al., Data collection for the nonelectronic reliability handbook. RADC-TR-68-114, Rome, NY, 1968.

    25. Offshore Reliability Data Handbook, OREDA-84, DNV, Hovik, Norway, 1984.

    26. Guidelines for Process Equipment Reliability Data with Data Tables, Center for Chemical Process Safety, American Institute of Chemical Engineers, New York, NY, 1989.

    27. Hofstadter, D.R., Godel, Escher, Bach: The Eternal Golden Braid, Vintage, New York, NY, 1979.

    28. Damasio, A.R., Descartes Error, Grosset/Putnam, New York, NY, 1994.

    29. Fragola, J.R., Comment on: O data, data! Wherefore art thou data. IEEE Trans. Reliab., R-32 (1983) 2.

    30. DeFinetti, B., The Theory of Probability, Vol. 1, John Wiley and Sons, New York, NY, 1974.

    31. Lofgren, E.V., Lofgren, E.V. & Thuggard, M., Analysis of standby stress and demand stress failure modes: methodology and applications to EDGs and MOVs. NUREG CR-5823, USNRC, Washington, DC, 1987.

    32. Grant, G.M., Roesener, W.S., Hall, D.G., Atwood, C.L., Gentillon, C.D. & Wolf, T.R., High pressure coolant injection (HPC1) system performance, 1987- 1993. INEL-94/O158, Idaho Falls, ID, February 1995.

    33. OREDA Participation Offshore Reliability Data, OREDA-92. P.O. Box DNV Technica, N-1322, Hovik, Norway, 1992,

    34. Vesely, W.E., DeMoss, G., Lofgren, E.V., Ginzlouag, T. & Samanta, P., Evaluation of diesel unavailability and risk effective surveillance test intervals. BNL Tech. Report A-3230, Brookhaven National Laboratory, Upton, NY, 1986.

    35. Samanta, P., Kim, I., Uryasev, S., Penoyar, J. & Vesely, W., Emergency diesel generator: maintenance and failure unavailability, and their risk impacts. NUREG CR-5994, USNRC, Washington, DC, 1994.

    36. Isakowitz, S.J., International Reference Guide to Space Launch Systems, American Institute of Aeronautics and Astronautics (AIAA), Washington DC, 1991 Edition.

  • 136 J. R. Fragola

    37. Dimensions International, Inc. and SAIC, NASA Space Risk Data Collection~Analysis Project Review, Briefing for NASA Headquarters, 1993.

    38. Thaggard, M., Databases for reliability and probabilistic risk assessment. 1995 Proceedings of Annual RAMS Symposium, IEEE, New York, NY, 1995 pp.327-335.

    39. Preliminary User's Guide: NASA Space Systems Risk-Reliability-Availability-Maintainability (RRAM) Workstation. Prepared for Vitro Corporation and NASA Code QS, SAIC, New York, 1993.

    40. Fragola, J.R, McFadden, R.H., DeMoss, G.M., Karns, J.J., Gygignani, P.L., Janicik, T.J. & Collins, E.P., Final Report: Reliability Analysis for Space Station Freedom, Volumes 1 and 2, SAIC, New York, NY, 1990

    41. Cooke, R., Donegaal, J., Bedford, T., Design of reliability data bases for aerospace applications. Report to the European Space Agency ESTEC, Draft, TU Delft, Delft, The Netherlands, 1993.

    42. Carlson, L. et al., T-Book-Reliability Data of Com- ponents in Nordic Nuclear Power Plants', ATV Office, Vattenfall AB, S-162, 87 Vallingby, Sweden, 1987

    43. Bento, J.-P., Bj0re, S., Ericsson, G., Hasler, A., Lydbn, C.-O., Wallin, L., P0rn, K. & Aberlund, O., Reliability data book for components in Swedish nuclear power plants. RKS/SKI, RKS85-25, Stockholm, Sweden, 1985.

    44. Moody, J.W., Reliability of ballistic missiles and space vehicles. Working Paper, Reliability Office, George C. Marshall Space Center, Huntsville, AL, 1961.

    APPENDIX

    Definitions:

    1. Reliability/Risk Data Bank- -A library of raw and processed sources of reliability and risk relevant data usually, but not necessarily, converted into an electronically accessible form. The data sources included in the data bank may be processed to a standard norm to facilitate user selection and aggregation but should always retain a traceable pedigree back to individual data sources.

    2. Reliability/Risk Relevant Data--E i ther proc- essed or raw parameter data which allow for the establishment of information sets useful in the quantification of reliability/risk models. The para- meters include the number of demand and time related failures and the associated exposure across the considered population, and the known unavailable times for preventative and corrective maintenance, and test.

    3. Raw Data--Reliabil ity and Risk Relevant data gleaned from primary sources such as maintenance work requests, failure reports, and station equipment clearance permits which record the in-situ observed failure conditions, the corrective actions taken and the equipment out-of-service times.

    4. Reliability/Risk Data Base--A pre-selected processed output produced and usually available in hard copy form designed to be useful for a variety of

    analyses whose general type and depths have been anticipated by the data base designer in the data base design specification.

    5. Generic Data- -Data gleaned from a variety of data base and data bank sources which record the performance of general equipment or device types under more or less standard conditions. Generic data therefore provides for information concerning Reliability/Risk Relevant data that is useful to establish the range of parameter values to be expected across a board class of equipment or device design and operational attributes.

    6. Specific Data- -Data obtained from the historical performance of the specific device or set of device types with the 'identical' or 'nearly identical' design and operational class attributes. The terms identical or nearly identical are determined to mean in the reliability/risk analysis context: the attribute sets for all members of the class are such that each of the members within the class (including the member undergoing analysis) can be assumed to be replaceable one with another and that the class uncertainty bounds are considered to be acceptable.

    7. Physics-of-failure--An approach taken to establ- ish a forecasted set of risk or reliability measures to be assigned to an equipment or device type which is obtained by applying fundamental physical laws to the performance of the device within its operating environment. Physics-of-failure includes a variety of techniques all ultimately directed at assessing the damage induced upon a device by physical stresses, pressure, loads, voltage, heat, etc. resulting from its internal and external operating environment and its associated structural strength in probabilistic terms. The probability of failure is then forecasted as the probability of exceedence of stress over strength, or alternatively the probability of unconstrained growth of structural defects (such as cracks) throughout the expected installed life of the device.

    8. Root Cause Analysis--An approach taken to determine ex post facto the ultimate (as opposed to proximate) cause or causes of a equipment or device failure. A root cause analysis is conducted in a systematically structured fashion following the path from observed conditions at failure to the causal factors which established those conditions. A root cause analysis utilizes a spectrum of logical (such as fault trees), and physical tools (such as experiments and simulations) along with such device or equipment specific data or generic data as may exist. This information is used in order to allow all potential hypothesis to be formulated and tested, and the likely hypotheses to be established as the most probable root cause of the failure.

Recommended

View more >