failure models
TRANSCRIPT
GO
Volume 9, Issue 1 Reliability Edge Home
Reliability Prediction Methods for Electronic Products
In today's competitive electronic products market, having higher reliability than competitors is
one of the key factors for success. To obtain high product reliability, consideration of reliability
issues should be integrated from the very beginning of the design phase. This leads to the
concept of reliability prediction. Historically, this term has been used to denote the process of
applying mathematical models and component data for the purpose of estimating the field
reliability of a system before failure data are available for the system. However, the objective
of reliability prediction is not limited to predicting whether reliability goals, such as MTBF, can
be reached. It can also be used for:
Identifying potential design weaknesses.
Evaluating the feasibility of a design.
Comparing different designs and life-cycle costs.
Providing models for system reliability/availability analysis.
Establishing goals for reliability tests.
Aiding in business decisions such as budget allocation and scheduling.
Once the prototype of a product is available, lab tests can be utilized to obtain more accurate
reliability predictions. Accurate prediction of the reliability of electronic products requires
knowledge of the components, the design, the manufacturing process and the expected
operating conditions. Several different approaches have been developed to achieve the
reliability prediction of electronic systems and components. Each approach has its unique
advantages and disadvantages. Among these approaches, three main categories are often used
within government and industry: empirical (standards based), physics of failure and life testing.
In this article, we will provide an overview of all three approaches.
First, we will discuss empirical prediction methods, which are based on the experiences of
engineers and on historical data. Several standards, such as MIL-HDBK-217, Bellcore/Telcordia,
RDF 2000 and China 299B, are widely used for reliability prediction of electronic products. Next,
we will discuss physics of failure methods, which are based on root-cause analysis of failure
mechanisms, failure modes and stresses. This approach is based upon an understanding of the
physical properties of the materials, operation processes and technologies used in the design.
Finally, we will discuss life testing methods, which are used to determine reliability by testing a
relatively large number of samples at their specified operation stresses or higher stresses and
using statistical models to analyze the data.
Empirical (or Standards Based) Prediction Methods
Empirical prediction methods are based on models developed from statistical curve fitting of
historical failure data, which may have been collected in the field, in-house or from
manufacturers. These methods tend to present good estimates of reliability for similar or slightly
modified parts. Some parameters in the curve function can be modified by integrating
engineering knowledge. The assumption is made that system or equipment failure causes are
inherently linked to components whose failures are independent of each other. There are many
different empirical methods that have been created for specific applications. Some have gained
popularity within industry in the past three decades. The table below lists some of the available
prediction standards and the following sections describe three of the most commonly used
methods in a bit more detail.
Change Language Quick Links Software Need Help? Search
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 1/11
MIL-HDBK-217 Predictive Method
MIL-HDBK-217 is very well known in military and commercial industries. It is probably the most
internationally recognized empirical prediction method, by far. The latest version is MIL-HDBK-
217F, which was released in 1991 and had two revisions: Notice 1 in 1992 and Notice 2 in 1995.
The MIL-HDBK-217 predictive method consists of two parts; one is known as the parts count
method and the other is called the part stress method [1]. The parts count method assumes
typical operating conditions of part complexity, ambient temperature, various electrical
stresses, operation mode and environment (called reference conditions). The failure rate for a
part under the reference conditions is calculated as:
where:
λref is the failure rate under the reference conditions.
i is the number of parts.
Since the parts may not operate under the reference conditions, the real operating conditions
will result in failure rates that are different from those given by the "parts count" method.
Therefore, the part stress method requires the specific part’s complexity, application stresses,
environmental factors, etc. (called Pi factors). For example, MIL-HDBK-217 provides many
environmental conditions (expressed as πE) ranging from "ground benign" to "cannon launch."
The standard also provides multi-level quality specifications (expressed as πQ). The failure rate
for parts under specific operating conditions can be calculated as:
where:
πS is the stress factor.
πT is the temperature factor.
πE is the environment factor.
πQ is the quality factor.
πA is the adjustment factor.
Figure 1 shows an example using the MIL-HDBK-217 method (in ReliaSoft’s Lambda Predict
software) to predict the failure rate of a ceramic capacitor. According to the handbook, the
failure rate of a commercial ceramic capacitor of 0.00068 mF capacitance with 80% operation
voltage, working under 30 degrees ambient temperature and "ground benign" environment is
0.0216/106 hours. The corresponding MTBF (mean time before failure) or MTTF (mean time to
failure) is estimated to be 46,140,368 hours.
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 2/11
Figure 1: MIL-HDBK-217 capacitor failure rate example
Bellcore/Telcordia Predictive Method
Bellcore was a telecommunications research and development company that provided joint R&D
and standards setting for AT&T and its co-owners. Because of dissatisfaction with military
handbook methods for their commercial products, Bellcore designed its own reliability prediction
standard for commercial telecommunication products. In 1997, the company was acquired by
Science Applications International Corporation (SAIC) and the company's name was changed to
Telcordia. Telcordia continues to revise and update the standard. The latest two updates are
SR-332 Issue 1 (May 2001) and SR-332 Issue 2 (September 2006), both called "Reliability
Prediction Procedure for Electronic Equipment."
The Bellcore/Telcordia standard assumes a serial model for electronic parts and it addresses
failure rates at the infant mortality stage and at the steady-state stage with Methods I, II and
III [2-3]. Method I is similar to the MIL-HDBK-217F parts count and part stress methods. The
standard provides the generic failure rates and three part stress factors: device quality factor
(πQ), electrical stress factor (πS) and temperature stress factor (T). Method II is based on
combining Method I predictions with data from laboratory tests performed in accordance with
specific SR-332 criteria. Method III is a statistical prediction of failure rate based on field
tracking data collected in accordance with specific SR-332 criteria. In Method III, the predicted
failure rate is a weighted average of the generic steady-state failure rate and the field failure
rate.
Lambda Predict has implemented Methods I and II, and Method III will be added in the next
version. Figure 2 shows an example in Lambda Predict using SR-332 Issue 1 to predict the
failure rate of the same capacitor in the previous MIL-HDBK-217 example (shown in Figure 1).
The failure rate is 9.654 Fits, which is 9.654 / 109 hours. In order to compare the predicted
results from MIL-HBK-217 and Bellcore SR-332, we must convert the failure rate to the same
units. 9.654 Fits is 0.000965 / 106 hours. So the result of 0.0216 / 106 hours in MIL-HDBK-217
is much higher than the result in Bellcore/Telcordia SR-332. There are reasons for this variation.
First, MIL-HDBK-217 is a standard used in the military so it is more conservative than the
commercial standard. Second, the underlying methods are different and more factors that may
affect the failure rate are considered in MIL-HDBK-217.
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 3/11
Figure 2: Bellcore capacitor failure rate example
RDF 2000 Predictive Method
RDF 2000 is a reliability data handbook developed by the French telecommunications industry.
This standard provides reliability prediction models for a range of electronic components using
cycling profiles and applicable phases as a basis for failure rate calculations [4]. RDF 2000
provides a unique approach to handle mission profiles in the failure rate prediction. Component
failure is defined in terms of an empirical expression containing a base failure rate that is
multiplied by factors influenced by mission profiles. These mission profiles contain information
about how the component failure rate may be affected by operational cycling, ambient
temperature variation and/or equipment switch on/off temperature variations. RDF 2000
disregards the wearout period and the infant mortality stage of product life based on the
assumption that, for most electronic components, the wearout period is never reached because
new products will replace the old ones before the wearout occurs. For components whose
wearout period is not very far in the future, the normal life period has to be determined. The
infant mortality stage failure rate is caused by a wide range of factors, such as manufacturing
processes and material weakness, but can be eliminated by improving the design and production
processes (e.g. by performing burn-in).
As an example, the empirical expression formula for a ceramic capacitor of class I is given by:
where:
(πt)i is the temperature factor related to the ith junction temperature of the capacitor
mission profile.
τi is the working time ratio of the capacitor for the ith junction temperature of the mission
profile.
τon is the total working time ratio of the capacitor, with τon + τoff = 1.
(πn)i is the ith influence factor related to the annual cycles number of thermal variations
seen by the package, with the amplitude ∆T.
∆Ti is the thermal amplitude variation of the ith mission profile.
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 4/11
Figure 3 shows the implementation of the failure rate prediction using RDF 2000 in Lambda
Predict.
Figure 3: RDF 2000 capacitor failure rate example
Discussion of Empirical Methods
Although empirical prediction standards have been used for many years, it is always wise to use
them with caution. The advantages and disadvantages of empirical methods have been
discussed a lot in the past three decades. A brief summary from the publications in industry,
military and academia is presented next [5-9].
Advantages of empirical methods:
1. Easy to use, and a lot of component models exist.
2. Relatively good performance as indicators of inherent reliability.
3. Provide an approximation of field failure rates.
Disadvantages of empirical methods:
1. A large part of the data used by the traditional models is out-of-date.
2. Failure of the components is not always due to component-intrinsic mechanisms but can
be caused by the system design.
3. The reliability prediction models are based on industry-average values of failure rate,
which are neither vendor-specific nor device-specific.
4. It is hard to collect good quality field and manufacturing data, which are needed to define
the adjustment factors, such as the Pi factors in MIL-HDBK-217.
Physics of Failure Methods
In contrast to empirical reliability prediction methods, which are based on the statistical
analysis of historical failure data, a physics of failure approach is based on the understanding of
the failure mechanism and applying the physics of failure model to the data. Several popularly
used models are discussed next.
Arrhenius's Law
One of the earliest and most successful acceleration models predicts how the time-to-failure of
a system varies with temperature. This empirically based model is known as the Arrhenius
equation. Generally, chemical reactions can be accelerated by increasing the system
temperature. Since it is a chemical process, the aging of a capacitor (such as an electrolytic
capacitor) is accelerated by increasing the operating temperature. The model takes the
following form.
where:
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 5/11
L(T ) is the life characteristic related to temperature.
A is the scaling factor.
Ea is the activation energy.
k is the Boltzmann constant.
T is the temperature.
Eyring and Other Models
While the Arrhenius model emphasizes the dependency of reactions on temperature, the Eyring
model is commonly used for demonstrating the dependency of reactions on stress factors other
than temperature, such as mechanical stress, humidity or voltage.
The standard equation for the Eyring model [10] is as follows:
where:
L(T ,S) is the life characteristic related to temperature and another stress.
A, α, B and C are constants.
S is a stress factor other than temperature.
T is absolute temperature.
According to different physics of failure mechanisms, one more term (i.e. stress) can be either
removed or added to the above standard Eyring model. Several models are similar to the
standard Eyring model. They are:
Two Temperature/Voltage Model:
Three Stress Model (Temperature-Voltage-Humidity):
Corrosion Model:
Electronic devices with aluminum or aluminum alloy with small percentages of copper and silicon
metallization are subject to corrosion failures and therefore can be described with the following
model [11]:
where:
B0 is an arbitrary scale factor.
α is equal to 0.1 to 0.15 per % RH.
f(V) is an unknown function of applied voltage, with empirical value of 0.12 to 0.15.
Hot Carrier Injection Model:
Hot carrier injection describes the phenomena observed in MOSFETs by which the carrier gains
sufficient energy to be injected into the gate oxide, generate interface or bulk oxide defects
and degrade MOSFETs characteristics such as threshold voltage, transconductance, etc. [11]:
For n-channel devices, the model is given by:
where:
B is an arbitrary scale factor.
Isub is the peak substrate current during stressing.
N is equal to a value from 2 to 4, typically 3.
Ea is equal to -0.1eV to -0.2eV.
For p-channel devices, the model is given by:
where:
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 6/11
B is an arbitrary scale factor.
Igate is the peak gate current during stressing.
M is equal to a value from 2 to 4.
Ea is equal to -0.1eV to -0.2eV.
Since electronic products usually have a long time period of useful life (i.e. the constant line of
the bathtub curve) and can often be modeled using an exponential distribution, the life
characteristics in the above physics of failure models can be replaced by MTBF (i.e. the life
characteristic in the exponential distribution). However, if you think your products do not
exhibit a constant failure rate and therefore cannot be described by an exponential distribution,
the life characteristic usually will not be the MTBF. For example, for the Weibull distribution, the
life characteristic is the scale parameter eta and for the lognormal distribution, it is the log
mean.
Black Model for Electromigration
Electromigration is a failure mechanism that results from the transfer of momentum from the
electrons, which move in the applied electric field, to the ions, which make up the lattice of the
interconnect material. The most common failure mode is "conductor open." With the decreased
structure of Integrated Circuits (ICs), the increased current density makes this failure
mechanism very important in IC reliability.
At the end of the 1960s, J. R. Black developed an empirical model to estimate the MTTF of a
wire, taking electromigration into consideration, which is now generally known as the Black
model. The Black model employs external heating and increased current density and is given
by:
where:
A0 is a constant based on the cross-sectional area of the interconnect.
J is the current density.
Jthreshold is the threshold current density.
E a is the activation energy.
K is the Boltzmann constant.
T is the temperature.
N is a scaling factor.
The current density (J) and temperature (T) are factors in the design process that affect
electromigration. Numerous experiments with different stress conditions have been reported in
the literature, where the values have been reported in the range between 2 and 3.3 for N, and
0.5 to 1.1eV for Ea. Usually, the lower the values, the more conservative the estimation.
Coffin-Manson Model for Fatigue
Fatigue failures can occur in electronic devices due to temperature cycling and thermal shock.
Permanent damage accumulates each time the device experiences a normal power-up and
power-down cycle. These switch cycles can induce cyclical stress that tends to weaken
materials and may cause several different types of failures, such as dielectric/thin-film cracking,
lifted bonds, solder fatigue, etc. A model known as the (modified) Coffin-Manson model has
been used successfully to model crack growth in solder due to repeated temperature cycling as
the device is switched on and off. This model takes the form [9]:
where:
Nf is the number of cycles to failure.
Α is a coefficient.
f is the cycling frequency.
∆T is the temperature range during a cycle.
Α is the cycling frequency exponent.
Α is the temperature exponent.
G(Tmax) is equal to:
which is an Arrhenius term evaluated at the maximum temperature in each cycle.
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 7/11
Three factors are usually considered for testing: maximum temperature (Tmax), temperature
range (∆T) and cycling frequency (f). The activation energy is usually related to certain failure
mechanisms and failure modes, and can be determined by correlating thermal cycling test data
and the Coffin-Manson model.
Discussion of Physics of Failure Methods
A given electronic component will have multiple failure modes and the component's failure rate
is equal to the sum of the failure rates of all modes (i.e. humidity, voltage, temperature,
thermal cycling and so on). The system's failure rate is equal to the sum of the failure rates of
the components involved. In using the above models, the model parameters can be determined
from the design specifications or operating conditions. If the parameters cannot be determined
without conducting a test, the failure data obtained from the test can be used to get the
model parameters. Software products such as ReliaSoft's ALTA can help you analyze the failure
data.
We will give an example of using ALTA to analyze the Arrhenius model. For this example, the life
of an electronic component is considered to be affected by temperature. The component is
tested under temperatures of 406, 416 and 426 Kelvin. The usage temperature level is 400
Kelvin. The Arrhenius model and the Weibull distribution are used to analyze the failure data in
ALTA. Figure 4 shows the data and calculated parameters. Figure 5 shows the reliability plot
and the estimated B10 life at the usage temperature level.
Figure 4: Data and analysis results in ALTA with the Arrhenius- Weibull model
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 8/11
Figure 5: Reliability vs. Time plot and calculated B10 life
From Figure 4, we can see that the estimated activation energy in the Arrhenius model is 0.92.
Note that, in ALTA, the Arrhenius model is simplified to a form of:
Using this equation, the parameters B and C calculated by ALTA can easily be transformed to
the parameters described above for the Arrhenius relationship.
Advantages of physics of failure methods:
1. Accurate prediction of wearout using known failure mechanisms.
2. Modeling of potential failure mechanisms based on the physics of failure.
3. During the design process, the variability of each design parameter can be determined.
Disadvantages of physics of failure methods:
1. Need detailed component manufacturing information (such as material, process and
design data).
2. Analysis is complex and could be costly to apply.
3. It is difficult to assess the entire system.
Life Testing Method
As mentioned above, time-to-failure data from life testing may be incorporated into some of the
empirical prediction standards (i.e. Bellcore/Telcordia Method II) and may also be necessary to
estimate the parameters for some of the physics of failure models. However, in this section of
the article, we are using the term life testing method to refer specifically to a third type of
approach for predicting the reliability of electronic products. With this method, a test is
conducted on a sufficiently large sample of units operating under normal usage conditions.
Times-to-failure are recorded and then analyzed with an appropriate statistical distribution in
order to estimate reliability metrics such as the B10 life. This type of analysis is often referred
to as Life Data Analysis or Weibull Analysis.
ReliaSoft's Weibull++ software is a tool for conducting life data analysis. As an example,
suppose that an IC board is tested in the lab and the failure data are recorded. Figure 6 shows
the data entered into Weibull++ and analyzed with the 2-parameter Weibull lifetime distribution
while Figure 7 shows the Reliability vs. Time plot and the calculated B10 life for the analysis.
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 9/11
Figure 6: Data and analysis results in Weibull++ with the Weibull distribution
Figure 7: Reliability vs. Time plot and calculated B10 life for the analysis
Discussion of the Life Testing Method
The life testing method can provide more information about the product than the empirical
prediction standards. Therefore, the prediction is usually more accurate, given that enough
samples are used in the testing.
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 10/11
The life testing method may also be preferred over both the empirical and physics of failure
methods when it is necessary to obtain realistic predictions at the system (rather than
component) level. This is because the empirical and physics of failure methods calculate the
system failure rate based on the predictions for the components (e.g. using the sum of the
component failure rates if the system is considered to be a serial configuration). This assumes
that there are no interaction failures between the components but, in reality, due to the design
or manufacturing, components are not independent. (For example, if the fan is broken in your
laptop, the CPU will fail faster because of the high temperature.) Therefore, in order to consider
the complexity of the entire system, life tests can be conducted at the system level, treating
the system as a "black box," and the system reliability can be predicted based on the obtained
failure data.
Conclusions
In this article, we discussed three approaches for electronic reliability prediction. The empirical
(or standards based) methods can be used in the design stage to quickly obtain a rough
estimation of product reliability. The physics of failure and life testing methods can be used in
both design and production stages. In physics of failure approaches, the model parameters can
be determined from design specs or from test data. On the other hand, with the life testing
method, since the failure data from your own particular products are obtained, the prediction
results usually are more accurate than those from a general standard or model.
References
[1] MIL-HDBK-217F, Reliability Prediction of Electronic Equipment, 1991. Notice 1 (1992) and
Notice 2 (1995).
[2] SR-332, Issue 1, Reliability Prediction Procedure for Electronic Equipment, Telcordia, May
2001.
[3] SR-332, Issue 2, Reliability Prediction Procedure for Electronic Equipment, Telcordia,
September 2006.
[4] ITEM Software and ReliaSoft Corporation, RS 490 Course Notes: Introduction to Standards
Based Reliability Prediction and Lambda Predict, 2006.
[5] B. Foucher, J. Boullie, B. Meslet and D. Das, "A Review of Reliability Prediction Methods for
Electronic Devices," Microelectron. Wearout., vol. 42, no. 8, August 2002, pp. 1155-1162.
[6] M. Pecht, D. Das and A. Ramarkrishnan, "The IEEE Standards on Reliability Program and
Reliability Prediction Methods for Electronic Equipment," Microelectron. Wearout., vol. 42, 2002,
pp. 1259-1266.
[7] M. Talmor and S. Arueti, "Reliability Prediction: The Turnover Point," 1997 Proc. Ann.
Reliability and Maintainability Symp., 1997, pp. 254-262.
[8] W. Denson, "The History of Reliability Prediction," IEEE Trans. On Reliability, vol. 47, no. 3-
SP, September 1998.
[9] D. Hirschmann, D. Tissen, S. Schroder and R.W. de Doncker, "Reliability Prediction for
Inverters in Hybrid Electrical Vehicles," IEEE Trans. on Power Electronics, vol. 22, no. 6,
November 2007, pp. 2511-2517.
[10] NIST Information Technology Library. [Online document] Available HTTP: www.itl.nist.gov
[11] Semiconductor Device Reliability Failure Models. [Online document] Available HTTP:
www.sematech.org/docubase/document/3955axfr.pdf
[Editorial Note: In the printed edition of Volume 9, Issue 1, there were two errors that have
been corrected in this online version. We apologize for any inconvenience. 1) 9.654 Fits is
9.654 / 109 hours (rather than 1010). 2) In the equations for hot carrier injection models, "Ea
is equal to -0.1eV to -0.2eV."]
Copyright © 1992 - 2011 ReliaSoft Corporation. All Rights Reserved
Privacy Statement | Terms of Use | Contact | About Us
2/1/2011 MIL-217, Bellcore/Telcordia and Other …
reliasoft.com/…/prediction_methods.htm 11/11