design for reliability by adesh

8/2/2019 Design for Reliability by Adesh

1/68

ADESH KUMAR

M.TECH-1ST YEAR

(MACHINE DESIGN)

JAMIA MILLIA ISLAMIA

NEW DELHI

DESIGN FOR RELIABILITY


2/68

Chapter Objectives

Introduce the need for design for reliability

List the main causes of reliability failures

How do failures relate to their mechanisms

Describe each failure

Propose design guidelines against the failure


3/68

What is Reliability?

Reliability is:

The ability of an item to perform its required

function under defined customer operating

conditions for a stated period of time.

The probability that no (system) failure will

occur in a given time interval

In research, the term reliability means"repeatability" or "consistency". A measure is

considered reliable if it would give us the same

result over and over again


4/68

Other Names of DFR

DFR has many aliases:

Design for Durability

Design for Robustness Design for Useful Life


5/68

What do Reliability Engineers Do?

Implement Reliability Engineering Programs

across all functions

EngineeringResearch

manufacturing

Testing

Packaging

field service


6/68

What is Probability?

Probability is:

A measure that describes the chance orlikelihood that an event will occur.

The probability that event (A) occurs isrepresented by a number between 0 (zero) and 1.

When P(A) = 0, the event cannot occur.

When P(A) = 1, the event is certain to occur.

When P(A) = 0.5, the event is as likely tooccur as it is not.


7/68


8/68

Cost-Reliability Functions


9/68

What are Noise Factors?

Noise Factors are sources of disturbing

influences that can disrupt the idealfunction, causing error states which lead

to quality problems.


10/68

Reliability Terms

Mean Time To Failure (MTTF) for non-repairablesystems

Mean Time Between Failures for repairable

systems (MTBF)

Reliability Probability (survival) R(t)

Failure Probability (cumulative density function )

F(t)=1-R(t)

Failure Probability Density f(t) Failure Rate (hazard rate) (t)


11/68

MTBF & MTTF

Mean Time Between FailuresApplies to repairableitems.

Mean Time To FailureApplies to non-repairableitems.

Both of these terms indicate the average time an item

is expected to function before failure.


12/68

Reliability Function

Probability density function of failuresf(t) = le-lt for t > 0

Probability of failure from (0 to T)

F(t) = 1 e-lT

Reliability functionR(T) = 1 F(T) = e-lT


13/68

14

Series Systems

RS = R1 R2 ... Rn

1 2 n


14/68

Serial reliability

Series systems are also referred to as

weakest link or chain systems.

System failure is caused by the failure of

any one component.

Therefore, for a series system, the reliability

of the system is the product of the individual

component reliabilities

More components = less reliability

1

n

i

i

s e r i a l r e l i a b i l i t y x


15/68


16/68

Parallel reliability

1

1 (1 )

n

i

i

p a ra llel relia b ility x

oParallel systems are also referred to as

redundant.

oThe system fails only if all of the components

fail.oTherefore, for a parallel system, the system

probability of failure is the product of the

individual component probabilities.


17/68

Series-Parallel Systems

Convert to equivalent series system

A B

C

C

D

RA RB RCRD

RC

A B C D

RA RB RD

RC

= 1 (1-RC)(1-RC)


18/68

ADESH18

A Simple Example

A system has 4000 components with afailure rate of 0.02% per 1000 hours.Calculate and MTBF.

= (0.02 / 100) * (1 / 1000) * 4000 = 8 *10-4 failures/hour

MTBF = 1 / (8 * 10-4 ) = 1250 hours


19/68

ADESH19

An Example A first generation computer contains 10000 components each

with = 0.5%/(1000 hours). What is the period of 99%reliability?

MTBF = t / (1 R(t)) = t / (1 0.99) t = MTBF * 0.01 = 0.01 /av Where av is the average failure rate N = No. of components = 10000 = failure rate of a component = 0.5% / (1000 hours) = 0.005/1000 = 5 * 10-6 per

hour

Therefore, av = N = 10000 * 5 * 10-6 = 5 * 10-2

per hour

Therefore, t = 0.01 / (5 * 10

-2

) = 12 minutes


20/68

Reliability Failure Modes

Failures may be SUDDEN (non-predictable) orGRADUAL (predictable). They may also be PARTIALor COMPLETE.

A Catastrophic failure is both sudden and complete.

A Degradation failure is both gradual and partial.

Two root causes:1. lack of robustness2. mistakes


21/68

Causes of Failure

MisuseFailures attributable to the application of

stresses beyond the stated capabilities of the item.

Inherent WeaknessFailures attributable to

weakness inherent in the item itself when subjected

to stresses within the stated capabilities of the item.


22/68

Classifications of Reliability Failure

Early stage failureCauses for such type of failure are

inadequate design, poor manufacturing, and inappropriate

usage. these can be catastrophic to human life.

Overstress MechanismsThese occur due to insufficientsafety factor in design, higher than expected

random loads, human errors, misapplication.

Wearout MechanismsOccur late in life and then increase

with age.This happens on corrosion, material fatigue, poor

maintenance, creep , degradation in strength.


23/68

Common Measures of Unreliability

% Failure - % of failures in a total population

MTTF (Mean Time To Failure) - the average time of

operation to first failure.

MTBF (Mean Time Between Failure) - the average time

between product failures.

Repairs Per Thousand (R/1000)

Bq LifeLife at which q% of the population will fail


24/68

Cumulative Failure Rate Curve


25/68


26/68

The Bathtub Curve

Reliability specialists often describe the lifetime ofa population of products using a graphical

representation called the bathtub curve. The

bathtub curve consists of three periods: an infant

mortality period with a decreasing failure rate

followed by a normal life period (also known as

"useful life") with a low, relatively constant failure

rate and concluding with a wear-out period thatexhibits an increasing failure rate.


27/68

27

Reliability

Age

Probof dying

in the nextyear(deaths/1000)

0

10

20

30

40

50

60

70

80

90

0 2 5 12 16 19 30 50 70 86

From the Statistical Bulletin 79, no 1, Jan-Mar 1998


28/68

Steps in Designing for Reliability

1. Develop a Reliability Plan

Determine Which Reliability Tools are

Needed

2. Analyze Noise Factors

3. Tests for Reliability

4. Track Failures and Determine Corrective

Actions


29/68

Develop a Reliability Plan

Planning for reliability is just as important asplanning for design and manufacturing.

Why?

To determine: useful life of product

what accelerated life testing to be used

Reliability must be as close to perfect as possible

for the products useful life. You MUST know where your product's major

points of failure are!


30/68

Tools for testing

Stress Analysis

Reliability Predictions (MTBF)

FMEA (Failure Mode and Effects Analysis)

Fault Tree Analysis

Reliability Block Diagrams


31/68

Why do Reliability Calculation?

Reliability calculations make the product

more reliable which can be used as a selling

feature by the marketing department. Also,

this adds to the company reputation and can

be used for comparisons with competition.


32/68

Stress Analysis

It establishes the presence of a safety margin

thus enhancing system life. Stress analysis

provides input data for reliability prediction.It is based on customer requirements.


33/68

Reliability Predictions (MTBF)

MTBF (Mean Time between Failures) for an

existing product can be found by studying field

failure data. For a new product however, or if

significant changes are made to the design, it maybe required to estimate or calculate MTBF before

any field data is available.


34/68

ADESH

Failure Modes and Effects Analysis

Failure modes and effects analysis (FMEA) is aqualitative technique for understanding the

behaviour of components in an engineered systems

The objective is to determine the influence of

component failure on other components, and on

the system as a whole

FMEA can also be used as a stand-alone procedure

for relative ranking of failure modes that screensthem according to risk.

F il d d ff t l i


35/68

Failure mode and effects analysis

(FMEA)

Failure Mode: Consider each component or functional block andhow it can fail.

Determine the Effect of each failure mode, and the severity on

system function.

Determine the likelihood of occurrence and detecting the failure. Calculate the Risk Priority Number (RPN = Severity X

Occurrence X Detection).

Consider corrective actions (may reduce severity of occurrence,

or increase probably of detection).

Start with the higher RPN values (most severe problems) and

work down.

Recalculate RPN after the corrective actions have been

determined, the aim is to minimize RPN.


36/68

ADESH

Reliability Block Diagrams

Most systems are defined through a combination of bothseries and parallel connections of subsystems

Reliability block diagrams (RBD) represent a system usinginterconnected blocks arranged in combinations of seriesand/or parallel configurations

They can be used to analyze the reliability of a systemquantitatively

Reliability block diagrams can consider active and stand-bystates to get estimates of reliability, and availability (or

unavailability) of the system Reliability block diagrams may be difficult to construct for

very complex systems


37/68

CASE STUDY: Network Storage

Evaluations Using

Reliability Calculations

This section uses a case study to introduce

concepts and calculations for systematically

comparing redundancy and reliability factors asthey apply to network storage configurations. We

will determine a reliability figure on three very

basic architectures. The starting point of our study

is the network storage requirements.


38/68

Network Storage Requirements

We want networked storage that has access to one

server. Later, this storage will be accessible to other

servers. The server is already in place, and has been

designed to sustain single component hardware failures

(with dual host bus adapters (HBAs), for example).

Data on this storage must be mirrored, and the storage

access must also stand up to hardware failures. The

cost of the storage system must be reasonable, while

still providing good performance.


39/68

Architecture 1 Architecture 1 provides the

basic storage necessitieswe are looking for with thefollowing advantages anddisadvantages:

Advantages:

Storage is accessible ifone of the links is down.

Storage A is mirrored ontoB.

Other servers can beconnected to the

concentrator to access thestorage.

Disadvantages:

If the concentrator fails, wehave no more access to

the storage. Thisconcentrator is a single


40/68

Architecture 2

Architecture 2 has been

improved to take intoaccount the previousSPOF. A concentratorhas been added.

Advantages:

If any links orcomponents go down,storage is stillaccessible (resilient to

hardware failures). Data is mirrored (Disk A

Disk B).

Other servers can beconnected to bothconcentrators to access


41/68

Architecture 3

The main difference is that

Disk A and Disk B have onlyone data path. Disk A is stillmirrored to Disk B, asrequired.

This architecture has all theadvantages of the previousarchitectures with thefollowing differences:

Disk A can only be accessed

through Link C, and Disk Bonly through Link D.

There is no data multi pathingsoftware layer, which resultsin easier administration

and easier troubleshooting.


42/68

Determining Reliability

Using the reliability formulas , we can determinewhich architecture has the highest reliability value.For the purpose of this article , we will use sampleMTBF values (as obtained by the manufacturer)and AFR*(Annual Failure Rate) values shown inthe table below:

*(The AFR for each component was calculated using the MTBF

where (8760/MTBF) = AFR). The example MTBF values weretaken from real network storage component statistics. However,such values vary greatly, and these numbers are given herepurely for illustration.


43/68


Component AFR

Variable

Sample MTBF Values

(hours)

AFR

HBA 1 H 800,000 0.011

HBA 2 H

LINK A L 400,000 0.022

LINK B L

Concentrator 1 C 580,000 0.0151

Concentrator 2 C

LINK C L 400,000 0.022LINK D L

Disk A D 1,000,000 0.0088

Disk BD


44/68


Having the rate of failure of each individualcomponent, we can obtain the system's annual

failure rate AFR and consequently the system

reliability (R) and system MTBF values. The AFR

values of redundant components are multiplied tothe power equal to the number of redundant

components. The AFR values of non-redundant

components are multiplied by the number of those

components in series.


45/68

Calculation

In case of Architecture 1, concentrator(C) is theonly non-redundant component.

AFR1 = (H+L)2 + C + L2 + D2

AFR1 = (0.011+0.022) 2 + 0.0151 + (0.022)2 +(0.0088)2 = 0.0167

R1 = 1 - AFR1 = 10.0167 = 0.9833, or 98.33%

MTBF1= 8760/AFR1 = 8760/0.0167 = 524,551

hours.


46/68

Calculation

The architecture 2 has a different configuration

with no non-redundant components.

AFR2 = (H+L+C+L) 2 + D2 AFR2 = (0.011+0.022+0.0151+0.022) 2 +

(0.0088)2 = 0.0005

R2 = 1AFR2 = 10.0005 = 0.995, or 99.50%

MTBF2= 8760/AFR2 = 8760/0.0005 = 1,752,000

hours.


47/68

Calculation

Architecture 3 has yet another configuration andhas no non-redundant components.

AFR3 = (H+L+C+L+D) 2

AFR3 = (0.011+0.022+0.0151+0.022+0.0088) 2 =0.0062

R3 = 1AFR3 = 10.0062 = 0.9938, or 99.38%

MTBF3= 8760/AFR3 = 8760/0.0062 = 1,412,903

hours.


48/68

Conclusion

When the calculations are complete, we compare thedata:

Architecture 1 = 98.33%, or a System's MTBF =524,551 hours

Architecture 2 = 99.50%, or a System's MTBF =1,752,000 hours

Architecture 3 = 99.38%, or a System's MTBF =1,412,903 hours

The MTBF figures are the most revealing, and indicatethat architecture 2 is statistically the most reliable ofall.

Failure Effects


49/68

Failure Effects

(What customer experiences)

Noise

Inoperability

Instability

Intermittent operation

Roughness

Excessive effort requirements Unpleasant or unusual odor

Poor appearance


50/68

Design &Manufacture

Pre-Production Design

Control of Production

Working Tolerances

Material QualityComponent Quality

Component Stress

Installation &Environmental

Temperature

Humidity

Vibration

Chemical Attack

Interconnections

Factors Affecting

Reliability


51/68

Design against failure

Important to understand the failure (why, where, howlong, application, etc.)

Two methods for design against failure:1. By reducing the stress that cause the failure.2. By increasing the strength of the component.

Either one can be achieved by: Selecting materials Changing the package geometry Changing the dimensions Protection


52/68

Fatigue Failure?

Fatigue is the most common mechanism of failureand responsible for 90% of all structural and

electrical failures.

Occurs in metals, polymers, and ceramics.

Metal paper clip example

Bend in both directions

Repeat the process


53/68

Design Against Fatigue Failure

Increase fatigue strength.

Reduce the amplitude of cylic loading.

avoid stress concentration region


54/68

Design Against Brittle Fracture

Brittle fracture is an overstress failuremechanism that occurs rapidly with little or nowarning when the induced stress in the

component exceeds the fraction strength ofthe material.

Occurs in brittle materials (ceramics, glasses

and silicon).

Applied stress and work could break theatomic bonds.

Design Guidelines to Reduce


55/68


Brittle Fracture

Designs with materials and processing

conditions that would produce the least

stress in brittle materials should be created.

The brittle material should be polished to

remove surface flaws to enhance reliability.


56/68

Design Against Creep Failure

What is Creep? A time-dependent deformation process under

load.

Thermally-activated process: the rate ofdeformation for a given stress level increasessignificantly with temperature.

Deformation depends on1. The applied load.2. The duration through which the load is applied3. Elevated temperature


57/68

Design Against Creep Failure

Creep can occur at any stress level.

Creep is most important at elevatedtemperatures.

Design Guidelines to Reduce Creep


58/68

Design Guidelines to Reduce Creep-

Induced Failure.

Use materials with high melting point if the

application calls for harsh temperature conditions.

Reduction of mechanical stress will reduce creep

deformation.

Creep is a time controlled phenomenon.


59/68

Design Against Plastic Deformation

What is Plastic Deformation?

When the applied mechanical stress exceeds theelastic limit or yield point of a material.

It is permanent.

Excessive deformation and continuedaccumulation of plastic strain due to cyclic loading

will eventually lead to cracking of the componentand make it unusable.

Design Guidelines Against Plastic


60/68

Design Guidelines Against Plastic

Deformation

Limit the design stresses in the packaging structure

below the yield strength of the materials used. If

possible, use materials that have high yieldstrength.

Design and control the local plastic deformation at

regions of stress concentrations.


61/68

Chemically Induced Failures

What are Chemically Induced Failures?

Chemical process such as electrochemical

reactions can result in cracking of components

leading to electrical failures.

Two Types

Corrosion Intermetallic Diffusion

Design Against Corrosion Induced


62/68

Design Against Corrosion-Induced

Failure

What is Chemical Corrosion?

The chemical or

electrochemical reaction

between a material, usually

a metal, and its environment

that produces a deterioration

of the material and itsproperties.



63/68


Corrosion

Metals with a high oxidation potential tend tocorrode faster.

Use hermetic packages to prevent moistureabsorption.

Ensure there are no trapped moisture or

contaminants during the processing an assembly ofthe packages.

Design Against Intermetallic


64/68

Design Against Intermetallic

Diffusion

What is Intermetallic Diffusion?

During wirebonding and solder reflow, the

joining process generates intermetallic layers

which are byproducts of the joining process.

Design Guidelines Against


65/68

Design Guidelines Against

Intermetallic Diffusion

Limit the process temperatures and control thetime exposed to high temperatures during the

joining process.

Control the temperature range and cycles ofexposure at the high temperature period.

Application of nickel/gold coating on the barecopper pad surfaces.


66/68

Achieving reliability growth

Detect failure causes

Feedback

Redesign

Improved fabrication

Verification of redesign


67/68

References

Mechanical reliability and design by A.D.S Carter

Introduction to reliability in design by Charles O.

Smith.

http://www.reliabilityanalysislab.com/ReliabilityServices.asp

http://pms401.pd9.ford.com:8080/arr/concept.htm


68/68

design for reliability by adesh

Documents