improvement of system reliability and failure avoidance

Upload: sumit-jha

Post on 04-Apr-2018

230 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    1/37

    1

    A project work report submitted

    to

    For partial fulfillment of the requirement for the

    Award of the degree

    of

    In

    By

    Under the guidance of

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    2/37

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    3/37

    3

    ACKNOWLEDGEMENT

    I would like to take the opportunity to extend my sincere

    thanks and gratitude to Dr.S.B.PRASAD our project

    supervisor for providing his assistance and co-ordination

    during the development of the report.

    I am thankful to Dr.A.M.Tigga , Head of Department,Production and Industrial Engg. for his constant

    encouragement and valuable suggestions throughout this

    project.

    Finally we are grateful to all the faculty of the Department

    of Production and Industrial Engg., N.I.T. Jamshedpur for

    their encouragement and inspiration.

    By

    SUMIT KUMAR JHA(308/06)SYED SARIM HUSSAIN (322/06)ROHIT SURIN (251/06)

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    4/37

    4

    CONTENTSPAGE

    CERTIFICATE.................................................................................. 2

    ACKNOWLEDGEMENT................. 3

    ABSTRACT... 5

    CHAPTER 1 INTRODUCTION

    1.1 MOTIVATION-RELIABILITY AND SYSTEMS ENGG. 6

    CHAPTER 2 LITERATURE REVIEW

    2.1 RELIABILITY THEORY.. 11

    CHAPTER 3 REVIEW OF RELATED WORK 17

    CHAPTER 4 FOUR STRATEGIES FOR IMPROVED ROBUSTNESS 21

    CHAPTER 5 SUMMARY 36

    REFERENCES.. 37

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    5/37

    5

    ABSTRACT

    To be reliable, a system must be robustit must avoid failure modes even in the

    presence of a broad range of conditions including harsh environments, changing

    operational demands, and internal deterioration. This project discusses and

    codifies techniques for robust system design that operate by expanding the range

    of conditions under which the system functions.

    A distinction is introduced between one-sided and two-sided failure modes, and

    four strategies are presented for creating larger windows between sets of one-

    sided failure modes. Each strategy is illustrated through two examples from

    industrial practice. For each strategy, one example is from paper handling and

    another is from jet engines. By showing that every strategy has been successfully

    applied to each system, we seek to illustrate that the strategies are widely

    applicable and highly effective.

    Key words: reliability; robust design; operating window; system architecture

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    6/37

    6

    INTRODUCTION

    Reliability may be defined in several ways:

    * The idea that something is fit for a purpose with respect to time;

    * The capacity of a device or system to perform as designed;

    * The resistance to failure of a device or system;

    * The ability of a device or system to perform a required function under

    stated conditions for a specified period of time;

    * The probability that a functional unit will perform its required function

    for a specified interval under stated conditions.

    * The ability of something to "fail well" (fail without catastrophic

    consequences)

    MOTIVATION: RELIABILITY AND SYSTEMS ENGINEERING

    Reliability engineers rely heavily on statistics, probability theory, and reliability

    theory. Many engineering techniques are used in reliability engineering, such

    as reliability prediction, Weibull analysis, thermal management, reliability

    testing and accelerated life testing. Because of the large number of reliability

    techniques, their expense, and the varying degrees of reliability required for

    different situations, most projects develop a reliability program plan to specify

    the reliability tasks that will be performed for that specific system.

    The function of reliability engineering is to develop the reliability requirements

    for the product, establish an adequate reliability program, and perform

    appropriate analyses and tasks to ensure the product will meet its

    requirements. These tasks are managed by a reliability engineer, who usually

    holds an accredited engineering degree and has additional reliability-specific

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    7/37

    7

    education and training. Reliability engineering is closely associated with

    maintainability engineering and logistics engineering. Many problems from

    other fields, such as security engineering, can also be approached using

    reliability engineering techniques. This article provides an overview of some of

    the most common reliability engineering tasks. Please see the references for a

    more comprehensive treatment.

    Reliability is among the most important topics in systems engineering.

    Reliability is the proper functioning of the system under the full range of

    conditions experienced in the field. Reliability requires two critical conditions:

    Mistake avoidance

    Robustness

    By mistake we refer to the plethora of design decisions and

    manufacturing operations that may be grossly in error. Examples of mistakes

    are installing a switch backwards, or interpreting a software command as being

    expressed in inches when it represents centimeters. Reliability can be

    improved by reducing the incidence of such mistakes by a combination of

    knowledge-based engineering and the problem-solving process.

    By robustness we refer to the ability of a system to function (i.e., to avoid

    failure) under the full range of conditions that may be experienced in the field. It

    is one sort of challenge to develop a system that functions for a demonstration

    under tightly controlled conditions such as in a laboratory. It is an entirely

    different challenge to make a system that functions reliably throughout its

    lifecycle as it experiences a broad set of real world environmental and operating

    conditions. Effective systems engineering is the second challenge, not the first

    one.

    Many types of engineering employ reliability engineers and use the tools and

    methodology of reliability engineering. For example:

    * System engineers design complex systems having a specified reliability

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    8/37

    8

    * Mechanical engineers may have to design a machine or system with a specified

    reliability

    * Automotive engineers have reliability requirements for the automobiles (and

    components) which they design

    * Electronics engineers must design and test their products for reliability

    requirements.

    * In software engineering and systems engineering the reliability engineering is

    the subdiscipline of ensuring that a system (or a device in general) will perform its

    intended function(s) when operated in a specified manner for a specified length

    of time. Reliability engineering is performed throughout the entire life cycle of a

    system, including development, test, production and operation....

    An alternative conception of reliability engineering is based on what we call

    Failure-mode avoidance.Many changes in system design that improve reliability

    do so by moving the physical failure modes. In fact, we argue that the most

    significant improvements in reliability come about by this means. Although this

    approach can be integrated with probability theory, it is not necessary to use

    probability theory to understand how these design changes bring about their

    effects.

    We claim that, especially in the early development of systems, the Failure-mode

    avoidance approach will lead to many improvements being made with a mini-

    mum amount of data requiredjust enough to guide the next improvement. The

    Failure-mode avoidance approach is deeply rooted in the physics of the system

    and is therefore tangible to the engineers, which facilitates the needed creative

    insights for concept design. This advantage is supported by recent results from

    cognitive psychology.

    A further advantage of the Failure-mode avoidance approach is that it reduces

    the salience of so-called specified operating conditions. At an early stage of

    system development, one cannot reasonably define a complete set of conditions

    that a system is likely to experience in its lifecycle.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    9/37

    9

    Although an approximate set of conditions can be defined, it will surely miss some

    important combinations of conditions. Later on, these unanticipated operating

    conditions may arise and the system may cease to function. When this happens, it

    is tempting to say that, since the condition was not specified, the system did not

    actually failthat the system was misused. It is essential for systems engineers to

    recognize that nature does not care what systems engineers think the specified

    operating conditions are. When the system fails to function under the conditionsthe system actually experiences, that constitutes a failure. This point is well

    understood by some reliability engineers. For example, Thomas, Ayers, and Pecht

    [2002] discuss trouble not identified warrantee returns in the auto industry and

    conclude: .It must not be assumed that a returned module that passes tests

    associated with an engineering specification is good,. Because of uncertainty

    regarding specified operating conditions, we argue that an effective approach is

    to increase the set of conditions under which the system operates and do this as

    quickly and economically as one can manage within the time available. This

    implies that systems engineers should not spend much energy on predicting field

    reliability but instead use that same energy to increase field reliability [Clausing,

    1994].It seems that the creative design work that leads to reliability improvement

    is a very natural activity and is consistent with our failure-mode avoidance

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    10/37

    10

    conception of reliability. We propose that thinking of reliability as failure-mode

    avoidance can have real advantages, especially in the early stages of system

    design or in a long-term scenario such as technology development. In early stages

    of system design, probability theory may be too quantitative for the task at hand.

    Probability density functions imply a level of precision in modeling the scenario

    that is often unwarranted, especially during early development. As a project

    advances through its development stages the probabilistic view of reliability

    becomes increasingly useful. Analysis of reliability using probability theory is

    useful for component selection, system validation, and the management of field-

    service operations. The value of the failure mode avoidance conception of

    reliability is greatest for technology strategy, systems architecting, concept

    design, and for some robust parameter design activities, all done early during the

    development of the system.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    11/37

    11

    2. RELIABILITY THEORY

    Reliability theory is the foundation of reliability engineering. For engineering

    purposes, reliability is defined as:

    theprobabilitythat a device will perform its intended function during a

    specified period of time under stated conditions.

    Mathematically, this may be expressed as,

    ,

    where is the failureprobability density functionand tis the length ofthe period of time (which is assumed to start from time zero).

    Reliability engineering is concerned with four key elements of this definition:

    First, reliability is a probability. This means that failure is regarded as

    arandomphenomenon: it is a recurring event, and we do not

    express any information on individual failures, the causes of failures,

    or relationships between failures, except that the likelihood for

    failures to occur varies over time according to the given probabilityfunction. Reliability engineering is concerned with meeting the

    specified probability of success, at a specified statisticalconfidence

    level.

    Second, reliability is predicated on "intended function:" Generally,

    this is taken to mean operation withoutfailure. However, even if no

    individual part of the system fails, but the system as a whole does

    not do what was intended, then it is still charged against the system

    reliability. The system requirements specification is the criterion

    against which reliability is measured. Third, reliability applies to a specified period of time. In practical

    terms, this means that a system has a specified chance that it will

    operate without failure before time . Reliability engineering ensures

    that components and materials will meet the requirements during

    the specified time. Units other than time may sometimes be used.

    http://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Probability_density_functionhttp://en.wikipedia.org/wiki/Probability_density_functionhttp://en.wikipedia.org/wiki/Probability_density_functionhttp://en.wikipedia.org/wiki/Randomhttp://en.wikipedia.org/wiki/Randomhttp://en.wikipedia.org/wiki/Randomhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Failurehttp://en.wikipedia.org/wiki/Failurehttp://en.wikipedia.org/wiki/Failurehttp://en.wikipedia.org/wiki/Failurehttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Randomhttp://en.wikipedia.org/wiki/Probability_density_functionhttp://en.wikipedia.org/wiki/Probability
  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    12/37

    12

    The automotive industry might specify reliability in terms of miles,

    the military might specify reliability of a gun for a certain number of

    rounds fired. A piece of mechanical equipment may have a reliability

    rating value in terms of cycles of use.

    Fourth, reliability is restricted to operation under stated conditions.This constraint is necessary because it is impossible to design a

    system for unlimited conditions. AMars Roverwill have different

    specified conditions than the family car. The operating environment

    must be addressed during design and testing. Also, that same rover,

    may be required to operate in varying conditions requiring additional

    scrutiny.

    Reliability program plan

    Many tasks, methods, and tools can be used to achieve reliability. Every system

    requires a different level of reliability. A commercialairlinermust operate under a

    wide range of conditions. The consequences of failure are grave, but there is a

    correspondingly higher budget. A pencil sharpener may be more reliable than an

    airliner, but has a much different set of operational conditions, insignificant

    consequences of failure, and a much lower budget.

    A reliability program plan is used to document exactly what tasks, methods, tools,

    analyses, and tests are required for a particular system. For complex systems, the

    reliability program plan is a separatedocument. For simple systems, it may be

    combined with thesystems engineeringmanagement plan orintegrated Logistics

    Supportmanagement plan. The reliability program plan is essential for a

    successful reliability program and is developed early during system development.

    It specifies not only what the reliability engineer does, but also the tasks

    performed by others. The reliability program plan is approved by top program

    management.

    Reliability requirementsFor any system, one of the first tasks of reliability engineering is to adequately

    specify the reliability requirements. Reliability requirements address the system

    itself, test and assessment requirements, and associated tasks and

    documentation. Reliability requirements are included in the appropriate

    http://en.wikipedia.org/wiki/Mars_Roverhttp://en.wikipedia.org/wiki/Mars_Roverhttp://en.wikipedia.org/wiki/Mars_Roverhttp://en.wikipedia.org/wiki/Airlinerhttp://en.wikipedia.org/wiki/Airlinerhttp://en.wikipedia.org/wiki/Airlinerhttp://en.wikipedia.org/wiki/Documenthttp://en.wikipedia.org/wiki/Documenthttp://en.wikipedia.org/wiki/Documenthttp://en.wikipedia.org/wiki/Systems_engineeringhttp://en.wikipedia.org/wiki/Systems_engineeringhttp://en.wikipedia.org/wiki/Systems_engineeringhttp://en.wikipedia.org/wiki/Integrated_Logistics_Supporthttp://en.wikipedia.org/wiki/Integrated_Logistics_Supporthttp://en.wikipedia.org/wiki/Integrated_Logistics_Supporthttp://en.wikipedia.org/wiki/Integrated_Logistics_Supporthttp://en.wikipedia.org/wiki/Integrated_Logistics_Supporthttp://en.wikipedia.org/wiki/Integrated_Logistics_Supporthttp://en.wikipedia.org/wiki/Systems_engineeringhttp://en.wikipedia.org/wiki/Documenthttp://en.wikipedia.org/wiki/Airlinerhttp://en.wikipedia.org/wiki/Mars_Rover
  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    13/37

    13

    system/subsystem requirements specifications, test plans, and contract

    statements.

    Design for reliability

    Design For Reliability (DFR), is an emerging discipline that refers to the process of

    designing reliability into products. This process encompasses several tools and

    practices and describes the order of their deployment that an organization needs

    to have in place to drive reliability into their products. Typically, the first step in

    the DFR process is to set the systems reliability requirements. Reliability must be

    "designed in" to the system. During systemdesign, the top-level reliability

    requirements are then allocated to subsystems by design engineers and reliability

    engineers working together.

    Reliability design begins with the development of amodel. Reliability models use

    block diagrams and fault trees to provide a graphical means of evaluating the

    relationships between different parts of the system. These models incorporate

    predictions based on parts-count failure rates taken from historical data. While

    the predictions are often not accurate in an absolute sense, they are valuable to

    assess relative differences in design alternatives.

    A FAULT TREE DIAGRAM

    One of the most important design techniques isredundancy. This means that if

    one part of the system fails, there is an alternate success path, such as a backup

    system. An automobile brake light might use two light bulbs. If one bulb fails, the

    brake light still operates using the other bulb. Redundancy significantly increases

    http://en.wikipedia.org/wiki/Designhttp://en.wikipedia.org/wiki/Designhttp://en.wikipedia.org/wiki/Designhttp://en.wikipedia.org/wiki/Mathematical_modelhttp://en.wikipedia.org/wiki/Mathematical_modelhttp://en.wikipedia.org/wiki/Mathematical_modelhttp://en.wikipedia.org/wiki/Redundancy_%28engineering%29http://en.wikipedia.org/wiki/Redundancy_%28engineering%29http://en.wikipedia.org/wiki/Redundancy_%28engineering%29http://en.wikipedia.org/wiki/File:Fault_tree.pnghttp://en.wikipedia.org/wiki/Redundancy_%28engineering%29http://en.wikipedia.org/wiki/Mathematical_modelhttp://en.wikipedia.org/wiki/Design
  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    14/37

    14

    system reliability, and is often the only viable means of doing so. However,

    redundancy is difficult and expensive, and is therefore limited to critical parts of

    the system. Another design technique, physics of failure, relies on understanding

    the physical processes of stress, strength and failure at a very detailed level. Then

    the material or component can be re-designed to reduce the probability offailure. Another common design technique is componentderating: Selecting

    components whose tolerance significantly exceeds the expected stress, as using a

    heavier gauge wire that exceeds the normal specification for the expected

    electrical current.

    Many tasks, techniques and analyses are specific to particular industries and

    applications. Commonly these include:

    Built-in test (BIT)

    Failure mode and effects analysis(FMEA)

    Reliability simulation modeling

    Thermal analysis

    Reliability Block Diagram analysis

    Fault tree analysis

    Root cause analysis

    Sneak circuit analysis

    Accelerated Testing

    Reliability Growth analysis

    Weibullanalysis

    Electromagnetic analysis

    Statistical interference

    AvoidSingle Point of Failure

    Results are presented during the system design reviews and logistics reviews.

    Reliability is just one requirement among many system requirements. Engineering

    trade studies are used to determine theoptimumbalance between reliability and

    other requirements and constraints.

    Reliability testing

    http://en.wikipedia.org/wiki/Deratinghttp://en.wikipedia.org/wiki/Deratinghttp://en.wikipedia.org/wiki/Deratinghttp://en.wikipedia.org/wiki/Electrical_currenthttp://en.wikipedia.org/wiki/Electrical_currenthttp://en.wikipedia.org/wiki/Failure_mode_and_effects_analysishttp://en.wikipedia.org/wiki/Failure_mode_and_effects_analysishttp://en.wikipedia.org/wiki/Thermal_analysishttp://en.wikipedia.org/wiki/Thermal_analysishttp://en.wikipedia.org/wiki/Fault_tree_analysishttp://en.wikipedia.org/wiki/Fault_tree_analysishttp://en.wikipedia.org/wiki/Root_cause_analysishttp://en.wikipedia.org/wiki/Root_cause_analysishttp://en.wikipedia.org/wiki/Weibull_distributionhttp://en.wikipedia.org/wiki/Weibull_distributionhttp://en.wikipedia.org/wiki/Statistical_interferencehttp://en.wikipedia.org/wiki/Statistical_interferencehttp://en.wikipedia.org/wiki/Single_Point_of_Failurehttp://en.wikipedia.org/wiki/Single_Point_of_Failurehttp://en.wikipedia.org/wiki/Single_Point_of_Failurehttp://en.wikipedia.org/wiki/Optimization_%28mathematics%29http://en.wikipedia.org/wiki/Optimization_%28mathematics%29http://en.wikipedia.org/wiki/Optimization_%28mathematics%29http://en.wikipedia.org/wiki/Optimization_%28mathematics%29http://en.wikipedia.org/wiki/Single_Point_of_Failurehttp://en.wikipedia.org/wiki/Statistical_interferencehttp://en.wikipedia.org/wiki/Weibull_distributionhttp://en.wikipedia.org/wiki/Root_cause_analysishttp://en.wikipedia.org/wiki/Fault_tree_analysishttp://en.wikipedia.org/wiki/Thermal_analysishttp://en.wikipedia.org/wiki/Failure_mode_and_effects_analysishttp://en.wikipedia.org/wiki/Electrical_currenthttp://en.wikipedia.org/wiki/Derating
  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    15/37

    15

    The purpose of reliability testing is to discover potential problems with the design

    as early as possible and, ultimately, provide confidence that the system meets its

    reliability requirements.

    Reliability testing may be performed at several levels. Complex systems may betested at component, circuit board, unit, assembly, subsystem and system levels.

    (The test level nomenclature varies among applications.) For example, performing

    environmental stress screening tests at lower levels, such as piece parts or small

    assemblies, catches problems before they cause failures at higher levels. Testing

    proceeds during each level of integration through full-up system testing,

    developmental testing, and operational testing, thereby reducing program risk.

    System reliability is calculated at each test level. Reliability growth techniques and

    failure reporting, analysis and corrective active systems (FRACAS) are often

    employed to improve reliability as testing progresses. The drawbacks to suchextensive testing are time and expense.Customersmay choose to accept more

    riskby eliminating some or all lower levels of testing.

    It is not always feasible to test all system requirements. Some systems are

    prohibitively expensive to test; somefailure modesmay take years to observe;

    some complex interactions result in a huge number of possible test cases; and

    some tests require the use of limited test ranges or other resources. In such cases,

    different approaches to testing can be used, such as accelerated life testing,

    design of experiments, andsimulations.

    The desired level of statistical confidence also plays an important role in reliability

    testing. Statistical confidence is increased by increasing either the test time or the

    number of items tested. Reliability test plans are designed to achieve the

    http://en.wikipedia.org/w/index.php?title=Customer_Value&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Customer_Value&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Customer_Value&action=edit&redlink=1http://en.wikipedia.org/wiki/Riskhttp://en.wikipedia.org/wiki/Riskhttp://en.wikipedia.org/wiki/Failure_modehttp://en.wikipedia.org/wiki/Failure_modehttp://en.wikipedia.org/wiki/Failure_modehttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Failure_modehttp://en.wikipedia.org/wiki/Riskhttp://en.wikipedia.org/w/index.php?title=Customer_Value&action=edit&redlink=1
  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    16/37

    16

    specified reliability at the specifiedconfidence levelwith the minimum number of

    test units and test time. Different test plans result in different levels of risk to the

    producer and consumer. The desired reliability, statistical confidence, and risk

    levels for each side influence the ultimate test plan. Good test requirements

    ensure that the customer and developer agree in advance on how reliabilityrequirements will be tested.

    A key aspect of reliability testing is to define "failure". Although this may seem

    obvious, there are many situations where it is not clear whether a failure is really

    the fault of the system. Variations in test conditions, operator differences,

    weather, and unexpected situations create differences between the customer and

    the system developer. One strategy to address this issue is to use a scoring

    conference process. A scoring conference includes representatives from the

    customer, the developer, the test organization, the reliability organization, andsometimes independent observers. The scoring conference process is defined in

    the statement of work. Each test case is considered by the group and "scored" as

    a success or failure. This scoring is the official result used by the reliability

    engineer.

    As part of the requirements phase, the reliability engineer develops a test

    strategy with the customer. The test strategy makes trade-offs between the

    needs of the reliability organization, which wants as much data as possible, and

    constraints such as cost, schedule, and available resources. Test plans and

    procedures are developed for each reliability test, and results are documented in

    official reports.

    http://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Failurehttp://en.wikipedia.org/wiki/Weatherhttp://en.wikipedia.org/wiki/Weatherhttp://en.wikipedia.org/wiki/Weatherhttp://en.wikipedia.org/wiki/Failurehttp://en.wikipedia.org/wiki/Confidence_interval
  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    17/37

    17

    3. REVIEW OF RELATED WORK

    This project is intended to help engineers with the early-stage, conceptual phase

    of design. Therefore, an important related development is the Theory of Inventive

    Problem Solving (sometimes described by the acronyms TRIZ or TIPS). The theory

    was first described by Altschuller [1984] and was recently placed in a broader

    context of innovation by Clausing and Fey [2004]. The theory is based on a study

    of thousands of patents that revealed patterns among inventive solutions. An

    important underlying hypothesis is that inventive problems can be viewed as

    conflicts which the inventive solutions resolve. This enabled large numbers of

    patents to be organized in a useful taxonomy. It has also given rise to commercial

    software products that facilitate the use of the theory by professional

    practitioners. However, we note that many patents claim robustness as their

    primary advantagethey do not deliver new functions, but deliver existing

    functions over a broader range of conditions. While TRIZ is helpful in

    development of new functions and elimination of harmful side effects, it does notseem to support reliability innovations to the extent we desire. Therefore, this

    paper analyzes patents and seeks new patterns of inventive engineering work.

    A development in reliability engineering closely related to this project is the

    physics-of-failure (PoF) approach developed at the Computer Aided Life Cycle

    Engineering (CALCE) Electronic Products and Systems Center at the University of

    Maryland. The first instance in archival literature of the term physics of failure is

    Pecht et al. [1990], which emphasizes use of a physics-based model for reliability

    prediction and design for reliability. This approach has been extended to product

    development by Pecht and Desgupta [1995] and to accelerated life testing by

    Kimseng et al. [1999].This paper builds upon the conception of physics-of- failure

    and seeks to extend this conception to the earliest, creative phases of system

    design.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    18/37

    18

    An important development in reliability engineering is robust parameter design

    pioneered by Genichi Taguchi [Taguchi, 1993]. For any design concept, there is a

    potentially large space of control factor settings that will nominally place the

    function at the desired target value. In robust parameter design, the engineer

    explores the design space seeking changes that will make the system more robust

    while still keeping the performance on target. Taguchis method employs

    orthogonal arrays to explore the design space. At the same time, outer arrays or

    compounded noises are used to explore the range of possible operating

    conditions. Signal to noise ratios are used as measures of the robustness of the

    system and guide the engineer to preferable levels of the control factors.

    Taguchis philosophy of robust design is consistent with the approach to reliability

    engineering discussed here. Taguchi rejected the goal post mentality inherent intolerance limits and specifications. His notion of a quality-loss function replaced

    consideration of defect rates and process yields with an emphasis on reducing

    variance followed by adjustment to target. Taguchi encouraged engineers to

    deliberately expose designs to harsh conditions in experiments. To do this

    requires a transformation in the culture of an engineering organization. The

    emphasis must shift from demonstrating adequate performance with high

    statistical confidence to aggressive improvement followed by adequate con-

    firmation.

    Robust parameter design is among the most important developments in systems

    engineering in the 20th

    century. These methods seem to have accounted for a

    significant part of the quality differential that made Japanese manufacturing so

    dominant during the 1970s.The methods were subsequently adopted outside of

    Japan. The timing of that adoption in the West corresponded closely with

    improvement in quality that improved competitiveness of North American and

    European manufacturers. Robust design methods were surely a significant part ofboth the rise of Japanese industry and the response to that competitive

    challenge. Robust design methods have continued to be refined and are still an

    active area of systems engineering innovation.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    19/37

    19

    Another approach relevant to this paper known as operating window methods

    was developed and practiced at Xerox Corporation in the 1970s. The operating

    window is the set of conditions under which the system operates without failure.

    In operating window methods, reliability is improved by making the operating

    window larger. Clausing [2004] described the approach in detail in a recent issue

    of Technometrics, but the essence of the approach is simple enough to present

    here:

    1. Increase the value of the noise factors so that the failure rate is high.

    2. Change the value of the control factors to seek a broader operating window at

    a fixed failure rate.

    This approach was used, for example, to improve the reliability of paper handling

    machines. At Xerox, paper stacks were designed and constructed to deliberately

    produce a large magnitude of variation. The papers varied in their weight, surface

    condition, geometry, and so on. These paper stacks were similar to the worst

    stacks one would encounter in field use, and, in con- junction with operation near

    the limit of the operating window, they brought about higher failure rates than

    would normally be encountered, on the order of 1 in 10 rather than 1 in 10,000.

    These high failure rates enabled the engineers to more quickly discern the effect

    of changes in failure rate with changes in the control factors such as stack forces,

    feed belt angles, and so on. This approach worried managers since they observed

    the machines jamming with high frequency, but they eventually came to

    understand why this was needed. As a consequence the engineers were able to

    quickly converge to more reliable machine configurations.

    Despite the use of failure rates as a measure of performance, the operating

    window method is, upon closer examination, consistent with Taguchis quality

    philosophy. Because failure rates were greatly increased by applying aggressivenoises, improvements could be made rapidly, even though they sacrificed the

    ability to accurately predict field reliability. The term operating window may

    seem to imply an emphasis on goal posts, but in fact the customer-specified

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    20/37

    20

    limits are viewed as irrelevant and the expansion of actual physical limits is valued

    instead.

    Operating window methods continue to be an active area of research in quality

    engineering. Joseph and Wu [2004] showed that under certain conditions a failurerate of 50% maximizes the information gained from robust design using an

    operating window. As an example, they carried out a case study wherein line

    width in a lithography process set at a much finer pitch than actually needed in

    practice. The control factor settings that improved the robustness at the finer

    pitch also improved the robustness at the pitch needed in operation. The basic

    concept of operating windows was therefore further corroborated.

    While retaining the benefits of Taguchis quality philosophy, operating window

    methods may have a further advantage. In operating window methods, the

    progress in reliability is measured in physical terms by the size of the operating

    window. This may be preferable to measuring results with a more abstract

    measure such as signal to noise ratios. For example, operating window methods

    encouraged engineers at Xerox to devise ways to double the range of paper

    weights the machine could feed rather than contemplate how to increase signal

    to noise ratios by 6 decibels. As previously discussed, cognitive psychology

    suggests there is an advantage in maintaining a connection to physical quantitiesrather than probabilistic measures. We propose that a mental connection to the

    physics and logic of the system is even more critical for early stage system design

    than it is for later stage parameter design.

    As discussed in this section, the basic concept of operating windows is to seek a

    larger set of conditions under which the system functions. While the idea is very

    simple, implementation is challenging, requiring deep knowledge of the system

    and the creativity to develop the needed design innovations. This paper seeks to

    help engineers implement early stage robustness work via operating window

    methods. The next section covers some theoretical developments. The sub-

    sequent sections present specific strategies for implementation.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    21/37

    21

    4.FOUR STRATEGIES FOR IMPROVED ROBUSTNESS

    Up to this point, this paper has focused on the interrelated concepts of reliability,

    robustness, and one-sided failure modes. From this point forward, the paper con-

    centrates on strategies to avoid one-sided failure modes. All of these strategiesinvolve concept design rather than parameter design. The design changes

    considered here are not only changes in the values of design parameters but also

    additions of new features or components, changes in the configuration of the

    system, or even new inventions. We present four strategies along these lines:

    1. Relax a constraint limit on an uncoupled control factor.

    2. Use physics of incipient failure to avoid failure.

    3. Create two distinct operating modes for two different demand conditions.

    4. Exploit interdependence between two operating window system variables.

    To illustrate these strategies and demonstrate their versatility, we present two

    different example applications of each strategy, a primary example that is

    described in considerable detail and a supplementary example that is described in

    less detail. Two engineering domains are used throughoutpaper feeders and jet

    engines. The next four subsections present these strategies.

    4.1. Relax a Constraint Limit on an Uncoupled Control Factor

    A control factor that affects only one of the one-sided failure modes in a system is

    said to be uncoupled as defined in Section 3. Such control factors should be

    maximized or minimized to create the greatest possible distance from the

    affected one-sided failure mode consistent with any constraints on the control

    factor. As the system is placed under greater demands over time due to system

    evolution and competition, the operating window afforded under the currentsystem constraints may become insufficient. Under these circumstances, the

    constraint can often be relaxed by making changes in the system architecture or

    by changes in technology.The relaxed constraint enables further changes to the

    uncoupled control factor, which opens the operating window.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    22/37

    22

    Primary Case StudyPaper Feeder. As an industrial example, we present the

    Xerox paper feeder that first went into production in 1981, and has appeared in

    many different Xerox copiers and printers. This paper feeder is known as a

    friction-retard feeder (Fig. 5).

    The feedbelt rests on the paper stack, and drags the top sheet forward. The

    friction of the retard roll holds back (retards) the second sheet if it tries to come

    through. Thus, the retard roll prevents multifeeds (feeding of more than one

    sheet). Therefore, the wrap angle between the feedbelt and the retard roll only

    affects the failure mode of multifeeds. The other primary failure mode is misfeeds

    (no sheet is fed). This failure mode is not affected by the wrap angle between the

    feedbelt and the retard roll. Because multifeeds are reduced by a large wrap

    angle and misfeeds are unaffected, it is clear that the wrap angle should be as bigas possible.

    Despite the desirability of having a large wrap angle, the previous-generation

    feeder (ca. 1975) had a wrap angle of only 13, which was constrained by the

    system architecture. In the new design that first went into production in 1981 the

    wrap angle was increased to 45. This large improvement in wrap angle was

    enabled by a change in the total system architecture. In large copiers and printers

    the next subsystem after the paper feeder is the registration subsystem, which

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    23/37

    23

    aligns the sheet with the image. In the new design the architecture was changed

    so that the paper came out of the feeder and turned down to reach the

    registration subsystem (Fig. 6), which was underneath the feeder. This enabledthe wrap angle to be greatly increased. This architecturealso reduced the width of

    the copier/printer, which is desirable. This paper feeder with the large wrap angle

    has been very successful in many generations of Xerox copiers and printers.

    Supplementary Case StudyJet Engines. A similar approach was used to improve

    the reliability of axial-flow fans in jet engines. A fan is a component of modern

    high by-pass commercial jet engines that provides a significant increase in the

    total mass flow, and therefore improvement in propulsive efficiency. A critical

    failure mode of such fans is flutter vibration due to the length of the blades and

    their exposure to inlet flow distortions. It had long been known that increasing

    the chord of a fan blade stiffened the blade and thereby reduced the incidence of

    the failure mode of flutter, but the chord of the blade was limited by constraints

    on weight [Koff, 2004]. Eventually, new technologies for manufacturing hollow

    blades enabled engine manufacturers to increase chords significantly without

    added weight. For example both Patent #4,345,877 [Monroe, 1980] and Patent

    #4,720,244 [Kluppel and Monroe, 1987] contributed to these advances. Wide-

    chord fans provided much greater resistance to flutter and have thereby greatly

    improved engine reliability. As in the case of wrap angles in paper feeders,

    innovation enabled a critical parameter to be pushed past its previous constraints

    to move a one-sided failure-mode boundary and increase the operating window.

    Summary of the Strategy. When a system variable only affects one of the one-

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    24/37

    24

    sided failure modes, take its value to its constraint limit. If the operating window

    is still not large enough, seek new architectures or technologies that relax the

    constraint.

    4.2. Use Physics of Incipient Failure To Avoid Failure

    In some systems the physics of the incipient failure can be used to prevent or

    delay the failure mode. All one sided failure modes are associated with underlying

    physical phenomena. In many cases the failure mode exhibits distinct physical

    mechanisms that become active as the onset of the failure mode is approached.

    In some systems there exists an opportunity to exploit the physics of incipient

    failure to increase the size of the operating window.

    Primary Case Study

    Jet Engines. An example is afforded by the use of shaped

    grooves in compressor casings in modern jet engines. An axial flow compressor is

    comprised of multiple alternating stages of rotor assemblies and stators. To limit

    engine complexity and weight, a large pressure rise per stage is desired so that

    the desired pressure rise in the compressor can be accomplished with a small

    number of stages. However, the pressure increase of each stage is limited by a

    failure mode of aerodynamic stall and surge. A stall involves separation of airflow

    from a blade, which at any given time may affect only one stage or even a group

    of stages

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    25/37

    25

    A compressor surge generally refers to a complete flow breakdown throughout

    the compressor. The value of airflow and pressure ratio at which a surge occurs is

    termed the surge point and surge margin is a term for the difference between

    the airflow and compression ratio at which it will normally be operated and the

    airflow and compression ratio at which a surge will occur. Thus, we can readily

    interpret surge margin as the distance from the one-sided failure mode of com-pressor surge.

    In the late 1970s new technologies known as casing treatments were

    developed. In one casing treatment technology assigned to Rolls Royce, Patent

    #4,086,022 [Freeman and Moritz, 1978], a series of angled channels are placed in

    the casing of the compressor extending from the leading edge of the rotors and

    extending just aft of the trailing edge (see Fig. 7). If a surge begins to occur, then

    a rotating annulus of pressurized gas will begin to build up about the tips of the

    blades. Because of the geometry of the slots, the annulus of air will be directed

    into the slots thus reducing or eliminating the surge *Freeman and Moritz,

    1978, p. 5].

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    26/37

    26

    To understand how the casing treatments are related to the operating window, it

    is useful to consider Figure 8 adapted from Cumpsty [1997]. The abscissa in the

    figure is mass flow of air into the engine. The mass flow in an engine may vary due

    to changes in inlet conditions caused by atmospheric conditions or aircraft

    maneuvers; therefore, mass flow is a noise factor as defined in Section 3.

    The ordinate in Figure 8 is pressure rise across a stage of the compressor. When

    conditions are at their nominal state, the engine will generally remain on the

    operating line with mass flow and pressure rise both changing as a function of thethrottle position set by the pilot. At a fixed throttle position, when mass flow is

    reduced due to maneuvers or environmental conditions, the state of the engine

    moves toward the surge line as indicated in step 1 of Figure 8. This pushes the

    engine off the operating line and toward the failure-mode boundary. The amount

    of mass-flow drop that can be tolerated before failure (step 3a or step 3b) is

    sometimes called the surge margin which we interpret as an indication of the

    operating window size. The technology described in Patent #4,086,022 can be

    viewed as a means to exploit the incipient failure-mode physics (the rotating

    annulus of airstep 2) to increase the surge margin. The treatments are designed

    so that the incipient physics will lead to a pressure relief across the stage (step

    3b). The advanced casing treatment increased fan stall margin by a staggering

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    27/37

    27

    20% under distorted inlet flow and with little loss in efficiency. *Koff, 2004, p.

    582].

    Supplementary Case StudyPaper Feeder. A similar approach was used to

    improve the reliability of paper feeders. For friction-retard paper feeders, thestack force between the feedbelt and the paper stack is a critical system variable.

    If it is too large the multifeed rate will be excessive. If the stack force is too small,

    the misfeed rate will be excessive. Therefore, there is an operating window

    between these two one-sided failure modes (Fig. 9).

    When the range of papers is moderate, it is easy to develop a sufficient operating

    window so that both the multifeed rate and the misfeed rate are very small.

    However, for the large range of papers that are typically used in large production

    copiers and printers, it is very difficult, or impossible, to develop a sufficient

    operating window, as shown on the left of Figure 9. On the left hand side of

    Figure 9, it is evident that no single value of stack force will simultaneously avoid

    both multifeeds and misfeeds over the full range of paper weights. This was still

    true after robust parameter design had been completed, so there was little hope

    to improve it further beyond the great improvement that had already been

    achieved.

    The problem was resolved through the development of a stack force

    relief/enhancement technology, U.S. Patent # 4,561,644 [Clausing, 1985]. This

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    28/37

    28

    technology uses two different values of the stack force, a small value for most

    papers, and a larger value for heavy papers (as depicted on the right side of Fig.

    9). Under normal conditions, the stack force is set to the small value. For most

    common paper weights this works very reliably. If a larger paper weight is used, a

    misfeed condition may begin to emerge. A sensor near the retard roll is designed

    to sense the arrival of the lead edge of the sheet. If an incipient misfeed occurs,

    the paper will not arrive within the desired time period. Under this condition, the

    stack force is increased to the large value. This was done by energizing the

    solenoid 90 in Figure 5, which pushed the feeder around the pivot 11, thus

    increasing the stack force. Thus, the machine was able to reliably feed the full

    range of paper weights.

    Summary of the Strategy. Exploit the physical mechanisms associated with an

    incipient failure to off-set the failure mode, thereby increasing size of theoperating window.

    4.3. Employ Two Different Operating Modes

    In some cases, the development process reaches a state in which the system has a

    limited operating window between multiple one-sided failure modes and

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    29/37

    29

    therefore cannot operate reliably. In such cases, it is often advisable to change

    from a single operating mode to two operating modes. Separately designing two

    distinct operating modes enables significant design freedom to seek better

    resistance to the failure modes. This strategy is often similar to the strategy use

    physics of incipient failure to avoid failure and in fact the two strategies can

    overlap. However two key distinctions should be made: (1) Incipient failure-mode

    physics do not always lead to clearly distinct operating modes, and (2) the switch

    between two modes need not be cued by incipient failure physics and can instead

    be cued by operator inputs or state variables of the system.

    Primary Case StudyPaper Feeder. A failure mode of friction retard paper

    feeders (Fig. 5) is excessive wear of the retard roll. In previous designs the roll had

    been rotated approximately once per hour to distribute the wear over the entireroll. Nevertheless, the wear was excessive, and was a considerable expense in

    service cost and lost production of the copier/printer. The critical variable that

    determines the wear of the retard roll is the force between the feedbelt and the

    retard roll, F, multiplied by the contact distance D between the feedbelt and the

    retard roll. The product, FD, is the work that the retard roll can do to remove

    energy from the second sheet, and thus stop the second sheet. However, this is

    also the work that causes wear of the retard roll.

    The result is as shown in Figure 10. With the previous design, one system variable

    FD has control of both of the one-sided failure modes, excessive multifeeds and

    excessive wear of the retard roll. Maurice Holmes at Xerox recognized that this

    problem could be resolved through a redesign of the retard mechanism by adding

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    30/37

    30

    a second operating mode. The innovation was included in the advanced paper

    feeder that first went into production in the Xerox 1075 copier in 1981,Patent #

    4,475,732 [Clausing et al., 1984].

    The inventive process that led to this invention is well described in terms of thetheory of inventive problem solving (TRIZ). The TRIZ process generally begins by

    framing the current problem as a conflict. In this case, there was an engineering

    conflict between avoiding multifeeds and avoiding excessive wear. In TRIZ, one

    effective way to seek a conflict resolution is through Sufield or substance-field

    analysis [Clausing and Fey, 2004]. Simple Sufield diagrams are in the form of a

    triad. The relevant triad diagram for the retard-roll problem is shown in the left

    hand side of Figure 11. Here substances are (1) the paper and (2) the roll/shaft.

    The field is the contact force. TRIZ includes many standards for the creativerevision of the Sufield. One of the standards is: To enhance the effectiveness of

    the Sufield, transform one substance into an independently controlled Sufield,

    thus generating a chain Sufield, p. 112. This can be implemented by introducing a

    field between the retard roll and its shaft (as shown in right hand side of Fig. 11).

    This is as far as Sufield analysis will take us. Now we have to use science and art to

    identify a field and a component for creating the field that will open an operating

    window. One such approach is to insert a friction brake with a brake torque T intothe design to produce a field between the retard roll and its shaft (U.S. Patent

    4,475,732). This field creates the possibility of two distinct operating modes: (1)

    When the torque that is applied to the roll is less than T, the roll remains

    stationary, and (2) when the torque that is applied to the roll is greater than T, the

    roll rotates.

    The torque that is applied to the retard roll is pro- duced by the friction from the

    belt or the paper, whichever is contacting the roll. When one sheet of paper is

    between the roll and the feedbelt, the friction coefficient has a value of 2, which

    overcomes the brake torque. Therefore, the roll rotates, and there is not any

    wear. When two sheets of paper are between the roll and the feedbelt, the

    friction coefficient is 0.6, and the brake torque prevents rotation of the retard

    roll. Thus the second sheet is stopped.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    31/37

    31

    The addition of the new operating mode created an additional design parameter

    brake torque which sets the condition for the switch between the two modes.

    Thus, the design space expands from a 1-D operating window to a 2-D operating

    window (Fig. 12). If the brake torque is set to an appropriate value, the retard rollwill only rub against the paper when the incipient multifeed condition actually

    occurs. In this case, the excessive-wear failure-mode boundary is never active and

    a new failure mode (paper damage) becomes the limiting factor on parameter FD,

    leaving a greatly increased operating window.

    Supplementary Case StudyJet Engines. A similar approach was used to

    simultaneously avoid two one-sided failure modes associated with combustion in

    jet engines. A combustor is a part of a jet engine in which fuel is injected into the

    air stream, mixed with air, and burned. Two key failure modes of a combustor are

    concerned with the composition of the exhaust gas, which is tightly regulated to

    protect the environment. One failure mode is excessive production of carbon

    monoxide (CO), which occurs with an overly lean mixture and low temperature in

    the combustion zone. Another failure mode is excessive production of oxides of

    nitrogen (NOX), which is associated with overly high temperature in the

    combustion zone. Given the changes in the thrust demands (and many other

    parameters that vary), it is a challenge to maintain the combustion conditions inthe small operating window between the failure modes. In the 1970s a new

    technology called two-zone or staged combustion substantially increased the

    operating window by affording multiple operating modes [Markowski, Lohmann,

    and Reilly, 1976; Lefebvre, 1999]. When the demand for thrust is low, all the

    combustion takes place in a single primary zone. When thrust demands are

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    32/37

    32

    highest, the engine automatically switches to a mode in which combustion occurs

    in two different zones each of which is functioning within the operating window

    between the CO and NOX related failure modes. This technology has been

    developed through many inventions including Patent #4,052,844 [Caruel,

    Quillevere, and Gastebois, 1977] and has become popular especially in gas

    turbine engines for ground based power [Washam, 1983]. As in the case of the

    paper feeders with a friction brake, the system automatically switches between

    two modes of operation in order to increase the operating window between two

    coupled one-sided failure-mode boundaries.

    Summary of Strategy. When it is not possible to simultaneously avoid two one-

    sided failure modes due to a wide range of noise values, consider defining two

    distinct operating modes so that at least one of the failure modes will be movedto increase the size of the operating window.

    4.4. Identify and Exploit Dependencies among Failure Modes

    In the operating-window approach, the parameter space is sketched out and the

    failure mode boundaries are identified. In the sketch, it is often the case that the

    parameters associated with the axes are not independent. A small change

    induced in one parameter will have an associated effect on the other one. It

    seems clear that such dependencies can influence system reliability. What is

    sometimes overlooked is that they often provide an opportunity to use the

    dependence to stay within the operating window.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    33/37

    33

    Primary Case StudyJet Engines. An example is afforded by turbine blade cooling

    systems [Sidwell, 2004]. The physical layout of the system is described in Figure

    13. Air from the compressor is routed to the first-stage turbine blades. The

    cooling flow path includes a Tangential On-Board Injector, which brings the flow

    from a supply at Ps into the rotating parts of the engine. The area between therotating seal and the blades acts as a plenum storing compressed gas at a

    pressure Pp. The gas then flows through each of the many first stage blades. The

    purpose of this flow is to cool the surface of the blades and thereby avoid the

    failure mode of early blade oxidation.

    To apply operating-window methods to this scenario, one may first sketch the

    parameter space and the failure-mode boundaries. Figure 14 depicts a highly

    simplified window with just two failure modes, oxidation of blade #1 and

    oxidation of blade #2. Manufacturing variation may excite failure mode #1

    (oxidation of blade #1) if its flow passages are constricted causing m1 to drop.

    However, the schematic diagram of Figure 14 suggests that there is a dependency

    among the failure modes. Any small drop in m1 tends to cause a rise in plenum

    pressure and a resulting rise in m2. The reverse is also trueany small drop in m2

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    34/37

    34

    tends to cause a rise in plenum pressure and a resulting rise in m1.

    Thisinterdependency of the failure modes creates an opportunity to create larger

    distance from both failure modes. Turbine blades are routinely tested for their

    flow characteristics. Sidwell proposed that this test could be used to sort the

    blades into low flow, medium flow, and high-flow classes. In this way, a second

    interdependency is added to the system. The low m1 due to the sorting process

    brings about a low m2. The nature of the interdependency caused by the plenum

    causes the two effects to cancel (or very nearly cancel) as depicted in Figure 14.

    Sidwell *2004+ estimated that binningturbine blades will increase the life of the

    high flow and medium flow blades by 50% or more and would enable low-flowing

    blades to be used with approximately the same life as current engines.

    Supplementary Case StudyPaper Feeder. In a document feeder for a copier it ishighly desirable to feed from the bottom of the stack of documents. This leaves

    the top of the stack free to receive the recirculated document after it has been

    copied. The most advanced document-feeder technology uses air to move the

    document, which minimizes damage to the document. Such feeders typically use

    a combination of positive air pressure and negative air pressure (vacuum). The

    positive air pressure is used to levitate the document stack (otherwise the weight

    of the document stack would tend to cause both misfeeds and multifeeds).

    Therefore, a sufficient pressure under the stack is required to avoid both misfeeds

    and multifeeds. However, excessive pressure under the stack could cause the last

    sheet to blow away. Therefore, good system design requires an operating window

    between inadequate pressure and excessive pressure, as shown in Figure 15.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    35/37

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    36/37

    36

    5. SUMMARY

    Reliability is one of the most important characteristics of an engineering system.

    Probabilistic formulations of reliability are useful for component selection,

    verification testing, and field-service management. However, at the early stagesof system architecting and concept design, probabilistic formulations are not as

    helpful. We propose that thinking in terms of physical mechanisms of failure is

    much more effective and that the fundamental principle of reliability engineering

    is failure-mode avoidance.

    A useful reliability-engineering concept is the operating window, which is the

    region in noise parameter space that avoids failure modes. In this paper we have

    given a mathematical definition of the operating window. We have shown that

    adding to the window increases the reliability regardless of the probability

    distributions of the noise factors. To this we add the principle that this should be

    done early and rapidly during the system development. In particular, concept

    design changes frequently add large regions to the operating window and account

    for some of the largest improvements to reliability of systems over the course of

    their development.

    To illustrate this approach, we have described four strategies for increasing

    operating window through concept design. Each strategy is illustrated by two case

    studies, one from the field of paper feeders for copiers and printers, and the

    other from the field of jet engines. Each case study includes past inventions that

    significantly improved reliability. By showing the theory and eight case studies we

    have displayed both the fundamentals and the diversity of industrial applications

    of this important approach to the development of reliable systems.

  • 7/30/2019 Improvement of System Reliability and Failure Avoidance

    37/37

    REFERENCES

    S.S. Rao, Reliability-based design, McGraw Hill, New York. 1992

    M Pecht and A Dasgupta,Physics of failure,aan approach to reliable product

    development,J Inst Environ Sci(1995)

    G Taguchi,Taguchi on Robust technology development,ASME Press,New

    York,1993

    Internet database