using aggregated cumulative hazard plots to visualize failure data

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL

Qual. Reliab. Engng. Int.2000;16: 209–219

USING AGGREGATED CUMULATIVE HAZARD PLOTS TOVISUALIZE FAILURE DATA

DAVID NEVELL ∗Electronic Data Systems Ltd (Rolls Royce account), PO Box 31, Moor Lane, Derby DE24 8BJ, UK

SUMMARYThe purpose of this paper is to demonstrate how the aggregated cumulative hazard (ACH) plot can be used tocomplement and extend standard Weibull analyses for non-repairable items. It looks at the shortcomings of usingprobability plots in isolation and shows how ACH plots can overcome these problems. Several examples arethen described covering a range of applications. ACH plots often suggest particular types of models. The three-parameter mixture model is considered in detail. The paper concludes that ACH plots can be regarded as the focalpoint of a unified and holistic approach to pragmatic part level reliability and safety analysis. Copyright 2000John Wiley & Sons, Ltd.

KEY WORDS: data visualization; Weibull distribution; cumulative hazard; batch models; mixture models;proportional hazard

INTRODUCTION

The visual representation of failure data plays a vitalpart in the process of understanding what informationthe data contain. Appropriate plots not only helpthe analyst decide what models should be fitted, butare an important communication medium to all otherparties. In a sense, they become the basic visualcurrency. As such, whatever is used should satisfysome basic requirements. The plot should be flexibleenough to suit all possible data types. It should besimple to understand. In addition, it should be robust;specifically, it should not be open to misinterpretationwhen presented independently of the original data.

It is useful to look at the environment within whichthe ideas in this paper have been developed and withinwhich they have been designed to be effective. Theemphasis on exceptionally high levels of safety andreliability that exist in the aerospace industry tend togive analyses some particular characteristics. Firstly,we shall assume that all items under analysis arenon-repairable. Severe and distinct failure modes arenot common, meaning that failure samples are oftenquite small and are heavily dominated by suspensions.More minor failure modes are often only detected atinspections, which can be infrequent and irregular,so that the actual failure ages are defined only overrelatively wide intervals. The high-tech nature of

∗Correspondence to: D. Nevell, 8 Copse Grove, Littleover, DerbyDE23 7WW, UK. Email: [email protected]

the products means that there is often a wealth ofother information connected with the component, itsoperating environment and its manufacturing history,that may or may not be relevant. There is a need tounderstand the root causes of failure, with a view tocorrecting them, as well as predicting future risk.

It is useful too to look at the situation from the pointof view of the analyst. The problem is to turn thiswealth of data (or lack of it) into useful information.The analyst may not necessarily be a statistical expert,but will be competent in using the standard tools ofreliability analysis. The analysis will be largely drivenby the capabilities of the available software. Theremay be closer affinity with the data rather than thetechniques, and this may also be true of the peoplewho need information from the analysis. The debatewill be in the domain of the engineer, as well as thestatistician, which emphasizes the need for clarity ofdata presentation.

NOTATION

f (t) probability density functionF(t) cumulative distribution functionh(t) hazard functionH(t) cumulative hazard functionG(t) cumulative failure functionK(t) aggregated cumulative hazard functionn(t) survivor functionη, β scale and slope of Weibull distribution

Received 22 March 1999Copyright 2000 John Wiley & Sons, Ltd. Revised 2 December 1999

210 D. NEVELL

MRR median rank regressionMLE maximum likelihood estimationACH aggregated cumulative hazardNGV nozzle guide vane

THE WEIBULL PROBABILITY PLOT

The Weibull probability plot is usually the startingpoint for visual representation of failure data. Forthe sake of this paper we shall assume that theWeibull distribution is used, although the argumentspresented apply equally well to other distributionssuch as the lognormal, should they be appropriate. TheWeibull probability plot has always been regarded asa powerful aid to model choice and identification ofpotential problems. However, on its own it does notfulfil all the necessary requirements of a pivotal visualtool. The shortcomings are now discussed.

An immediate problem of the Weibull probabilityplot is that its scales, especially the probability scale,are not very intuitive. There is clearly an advantagein having scales that allow exact distributions to berepresented linearly, but a disadvantage is that it is notthat easy to judge the significance of points deviatingfrom the line, notably in the vertical direction. To theuninitiated this can be confusing, even misleading.When it is considered that Weibull plots can reach abroad audience, including senior management, this isnot ideal.

A more serious problem lies with the fact thatthe only points shown explicitly on the plot arethose representing failures. Suspensions cannot beseen directly, although they affect where the failurepoints are plotted. The adjustment of failure ranksdue to suspensions is a process that throws usefulinformation away, particularly that relating to higher-life suspensions. This can be seen by looking atJohnson’s formula for adjustment:

rank increment=(N + 1 − previous adjusted rank)

/(1 + number of items beyond

present suspended item),

whereN =total failures and suspensions

This expression only uses thenumberof suspen-sions that exceed a given age, not the actual agesthemselves. This can lead to confusion when evena small number of suspensions exceed the highestfailure point. What looks good on a probability plotcan hide the fact that some components are survivingin a way that is inconsistent with the rest of the failuredata. For example, failures at 200 and 300 h plus asuspended unit at 305 h will be treated in exactly

Figure 1. Effect of high-life suspensions

the same way as failures at 200 and 300 h plus asuspension at 1000 h. The probability plots will lookidentical and the fitted median rank regression (MRR)models will be the same (Figure1, line A). It isonly when we use a distribution fitting method thatdoes not depend upon the visual appearance of theplot (e.g. maximum likelihood estimation [1]) that thedifference becomes apparant (Figure1, lines B and Cfor suspensions at 305 and 1000 h respectively).

This problem leads directly to the question of howthe goodness-of-fit of a model should be assessed.The implication of the Weibull probability plot is thata set of failure points lying on a reasonably straightline implies a good homogeneous model. This issubstantiated by the fact that the prime goodness-of-fitstatistic is often based on the correlation coefficient,which obviously reflects the visual appearance ofthe fitted line going through the points. The expertwill be cautious, knowing that there are sometimesadditional checks that can be made; for example,the comparison of the expected number of failuresto date (calculated usingF(t)) and the number ofactual failures. However, there is potential for beingmisled, and this additional check is not always valid.For example, when used on complete samples (i.e.no suspensions), the expected number of failuresis precisely half the actual number of failures,because the average median rank is 0.5. A similarmismatch occurs with near-complete samples. As hasalready been said, it is important that any visualmethod for showing data should be robust, meaningthat visual inspection alone should be sufficient toverify the model without the requirement of separate(and therefore not explicit) calculations. Weibullprobability plots do not completely satisfy this ideal.

We now consider the difficulty of dealing withinterval-censored failure data. This arises when theexact failure ages are unknown, except that they fallwithin an interval, usually the gap between successive

Copyright 2000 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int.2000;16: 209–219

AGGREGATED CUMULATIVE HAZARD PLOT 211

inspections. If the failure mode is benign and onlydetectable at inspection, at which point there is noreliable way of back estimating when the failure musthave occurred, then we have interval-censored failuredata. If the intervals are large compared to the age ofthe units (for example, when arising from successiveoverhauls of civil aero engines), any point estimateis liable to be inaccurate. Each must be treated asfalling within a band, rather than at a point. Insuch cases, model estimation can be achieved usingmaximum likelihood estimation (MLE). However,visual representation of the data and the fitted modelis a problem, as there is no satisfactory way of plottingthe failures onto Weibull probability scales. Methodssuch as Kaplan–Meier [1] can provide an equivalentplot for interval data, but rely upon the intervals beingwell ordered and failure-rich. We cannot rely upon thisbeing true.

Another problem directly related to the use ofMLE with interval data is that it calculates parametersthat cause the expected number of failures to dateto be very close to the actual number of failures.This effectively renders that particular check on thegoodness-of-fit useless, as there will generally be amatch, whatever the quality of the model. What isneeded is a method that can provide a visual goodness-of-fit check for all types of data and whatever fittingmethod. Probability plots do not do this.

Finally, in trying to interpret data, the analyst mayhave information on other variables that may have abearing on the pattern of failures, other than just age.Some of these may be continuous or ordinal, such ascalendar time or order of manufacture. Others, such asoperator, modification standard or batch, are discrete.It would be useful to be able to represent the effectof those variables in a simple way that did not relyon either subdividing the data and producing a set ofseparate models (difficult for small failure samples),or using more advanced multivariate methods. TheWeibull probability plot does not lend itself directly tothis type of analysis, as it is only used to show age onthex-axis, rather than variables of the type mentioned.

To summarize, we have identified several areasof concern regarding the use of Weibull probabilityplots, namely the scales, failure to show suspensions,failure to cope with interval data, andx-axis limitation.This appears to be a limiting list, but it shouldbe remembered that probability plots often carry apowerful message, and this paper in no way seeks toundermine their use. The approach suggested below isdesigned to complement the Weibull probability plotand to get the most out of whatever data are available.

THE AGGREGATED CUMULATIVE HAZARDPLOT

To address these problems, the aggregated cumulativehazard (ACH) plot is proposed. It is presented hereprimarily as a model verification method, althoughit does have potential as a fitting method. It showstwo graphs. The first is a cumulative step plot ofactual failures against age (on thex-axis). The second,comparative graph is the aggregation of truncatedcumulative hazard functions for each unit. Anysignificant deviation between the two graphs signifiesa deficiency in the fitted model. This is now explainedin more detail.

The hazard functionh(t) is defined byf (t)/(1 −F(t)) and is a measure of the failure rate of survivingunits. The cumulative hazard functionH(t) is givenby

H(t) =∫ t

0h(x) dx

For the Weibull distribution,H(t) = (t/η)β . At t = η,H(t) = 1 regardless ofβ.

Firstly, define a truncated version ofH(t) for eachunit i under analysis, which will be denoted byKi(t):

Ki(t) = H(t) for t < ti

Ki(t) = H(ti) for t ≥ ti

where ti is the age at which uniti fails. For a newunit we can calculate the expected value ofKi(t) at anarbitrary time horizont . This is given by

E(Ki(t)) =∫ t

0f (x)H(x) dx + H(t)(1 − F(t))

The first part of the expression covers the possibilityof the failure time being less than or equal tot , andthe second part covers cases where it is greater thant .Using integration by parts, we find that

E(Ki(t)) = F(t)

Secondly, define a step functionGi(t) for each uniti in the following way:

Gi(t) = 0 for t < ti

Gi(t) = 1 for t ≥ ti

where ti is again the failure point of uniti. It is asimple conclusion thatE(Gi(t)) = F(t). Now, forany unit i we can compareKi(t) andGi(t) directlyknowing that they have the same expectationF(t).The final step is to aggregateKi(t) andGi(t) over allN units in the population. This then produces the twographs on the ACH plot.


212 D. NEVELL

Figure 2. Aggregated cumulative hazard plot

The ACH graph at aget is then defined by

K(t) =N∑

i=1

Ki(t) =∑

failures<t

(ti/η̂)β̂

+∑

suspensions<t

(si/η̂)β̂ + n(t)(t/η̂)β̂

wheren(t) is the number of units still running at aget , andsi are suspension times for unfailed units.

The comparative graph of actual failures is definedby

G(t) =N∑

i=1

Gi(t)

Suspensions do not affectG(t), but contribute toK(t) up to their achieved age. Thus, for the purposesof the ACH graph, failures and suspensions are treatedequally, although they are obviously treated differentlyat the model-fitting stage. The plot is constructed up tothe point of the highest failure or suspension, and if thefitted model is consistent with the data, thenK(t) andG(t) should have similar profiles. Figure2 shows anexample of an ACH plot.

How does the ACH plot address the list of problemsidentified earlier? Firstly, it has straightforward linearscales. It is particularly advantageous to show thenumber of failures on they-axis, as this is a veryintuitive measure of the situation. Secondly, becausethe ACH graph is a function of both failures andsuspensions, it will continue to rise whenever there isexposure to failures at any age. High-life suspensionsare therefore highlighted by an increasing ACH graph,which may not always be matched by correspondingfailures. The third and fourth points, namely intervaldata and additional factors, can also be addressed, aswill now be described in more detail.

For interval data a discrete formulation is required.SinceK(t) is intended to reflect exposure toobservedfailures, it only increments at each inspection point.Similarly, G(t) is based on the times of the failedinspections. The incremental cumulative hazard for aninspection at agetn, given that the previous inspection(which must have been clear) was attn−1, is given by

(F (tn) − F(tn−1))/(1 − F(tn−1))

In other words, it is the probability of observing afailure in the interval(tn−1, tn) given that the unit hasalready survived totn−1. In the limit, as successiveinspection intervals tend to zero, the sum of theseincremental hazards can be shown to be equal to thecontinuous cumulative hazard function. For intervaldata the shape of the ACH plot is therefore driven bywhen inspections take place.

The final problem raised earlier, namely thatprobability plots are designed to show failures onlyas a function of increasing age, is easily addressedby the ACH method. The elemental parts of thecumulative plots can be reordered by any availableexplanatory variable, either continuous or discrete.One particularly useful reordering (if available) isby order of manufacture. Other possibilities includecalendar date, total population age, operator ormodification standard. In each case the general formof the plot is the same, being a direct comparison ofactual failures and aggregated cumulative hazard.

EXAMPLES

Let us now look at some examples of ACH inaction. Example 1 is based on data of aero enginebleed system failures taken from the USAF Weibullanalysis handbook [2]. There are 10 failures and 192suspensions. A model can be fitted using the standardmedian rank regression (MRR) approach. This givesan apparent good fit to the data, as suggested bythe probability plot (Figure3). The fitted modelindicates that a rapidly increasing number of failuresshould be seen. However, checking the ACH plotbased on the same MRR fit (Figure4) illustratesthe crucial problem, namely that there are too manyhigh-life suspensions relative to the lower-life failures.K(t) considerably overshootsG(t). Hence, withoutrecourse to any extra calculation, the problem isgraphically portrayed.

Notice that for the MRR modelK(t) followsG(t) closely up to the elbow at 1250 h. Thisis a characteristic of misfitting MRR models onACH plots. The ACH graph follows the failures butnot the high-life suspensions. MLE models behave



Figure 3. Weibull probability plot for Example 1

Figure 4. ACH plot for Example 1 (MRR fit)

Figure 5. ACH plot for Example 1 (MLE fit)

differently, as can be seen from the correspondingMLE ACH plot for the same data (Figure5). In thiscase,K(t) andG(t) match at either end but divergein the centre section of the plot. This illustrates thepoint made earlier about the danger of comparing theexpected total number of failures (essentially the topof the ACH graph in this case) against the actual totalnumber of failures. They may match in aggregate, butthe profile could be wrong. The construction of theMLE ACH plot is demonstrated in Table1.

Example 2 shows how ACH deals with interval data.The data shown in Table2are the results of inspectionsmade on nozzle guide vanes (NGVs) for leading-edgedamage. No NGV was inspected more than twice, andof the 89 inspections made, 42 resulted in rejections(denoted by ‘r’). The exact ages at which the NGVswould have been first rejected are unknown, and sothey are treated as lying within intervals, especially asthe intervals are wide compared to the total life of thecomponents.

MLE can be used to fit a Weibull distribution withslope 1.24 and scale 819. Is this a good fit? TheACH plot (Figure6) shows that it generally is.K(t)

and G(t) run closely together, apart from a slightdivergence at 500 h, which can probably be explainedby data rounding. Since no inspections have beenmade under 422 h, it is impossible to verify the modelin this earlier time period. There is no way of tellingwhether or not the damage builds up in the way that thefitted model describes unless further inspections aremade at lower ages. This may or may not be importantto the analyst.

Example 3 is based on oil pump drive gear failures.In this case we will look at a range of ACH plotsbased on different explanatory variables. There havebeen seven failures from a total population of 729units. The highest failure is at 29,067 h, althoughsuspensions continue up to 44,000 h. It has also beennoted that all the failures have occurred in the first183 serial numbers, and that six of the seven failurescame in a 7 month period. ACH can be used toexamine these patterns. Firstly, MRR is used to fita model, and a Weibull plot is produced (Figure7).Visually at least, this appears to be a reasonable fit.However, the age-based ACH plot (Figure8) showsthat there has been a tendency for the failures to occurwithin a tighter band than would be expected fromthe model, although the degree of mismatch is notextreme (MLE has been used so thatK(t) andG(t)

match at the upper end). The suspicion that earliermanufacturing serial numbers may be to blame can bechecked by reordering the ACH plot by serial numberorder. In this case the incremental cumulative hazard


214 D. NEVELL

Table 1. Construction of ACH plot (Example 1)

η = 3510 F =SUM(C) toβ = 2.96 A B C = A ∗ B D E = B ∗ D previous row E + F

Age Survivors K(t) for K(t) for all ACH Failurest s/f Freq H(t) Freq∗H(t) n(t) survivors stopped units K(t) G(t)

250 s 2 0.000 0.001 202 0.081 0.000 0.081 0550 s 2 0.004 0.008 200 0.829 0.001 0.829 0650 s 2 0.007 0.014 198 1.345 0.009 1.354 0708 f 1 0.009 0.009 196 1.715 0.023 1.738 1750 s 9 0.010 0.093 195 2.024 0.031 2.055 1828 f 1 0.014 0.014 186 2.587 0.125 2.712 2850 s 23 0.015 0.346 185 2.781 0.139 2.919 2884 f 2 0.017 0.034 162 2.735 0.484 3.219 4950 s 27 0.021 0.564 160 3.343 0.518 3.861 4

1013 f 1 0.025 0.025 133 3.360 1.082 4.442 51050 s 20 0.028 0.562 132 3.708 1.108 4.816 51082 f 1 0.031 0.031 112 3.439 1.669 5.108 61105 f 1 0.033 0.033 111 3.627 1.700 5.327 71150 s 22 0.037 0.809 110 4.045 1.733 5.778 71198 f 1 0.042 0.042 88 3.653 2.542 6.194 81249 f 1 0.047 0.047 87 4.085 2.583 6.669 91250 s 22 0.047 1.036 86 4.048 2.630 6.678 91251 f 1 0.047 0.047 64 3.020 3.666 6.685 101350 s 11 0.059 0.650 63 3.724 3.713 7.437 101450 s 11 0.073 0.803 52 3.798 4.363 8.161 101550 s 20 0.089 1.780 41 3.648 5.167 8.815 101650 s 8 0.107 0.857 21 2.248 6.946 9.195 101750 s 4 0.127 0.510 13 1.657 7.803 9.459 101850 s 2 0.150 0.300 9 1.352 8.312 9.664 101950 s 3 0.176 0.527 7 1.229 8.613 9.842 102050 s 3 0.204 0.611 4 0.814 9.139 9.954 102150 s 1 0.234 0.234 1 0.234 9.750 9.985 10

for each unit is taken to be its value at the age offailure or suspension, whichever is appropriate. Thisplot (Figure9) shows that the tendency of the earliestserial numbers to fail is reasonably consistent withthe fitted wear-out model, as it is inevitable that olderunits which have accumulated more hours will failfirst.

The other main feature of the data is that most ofthe failures have occurred within a tight time period,starting in mid-1997. The calendar date-based ACHplot (Figure 10) now shows a much more extrememismatch. The gap between 1992 and 1997 cannotbe readily explained by the fitted model. This directsthe analyst to where effort should be concentrated.Is there a possible calendar-triggered cause for therush of failures? Could this be due to a change in anoperating or maintenance procedure? Is it the effectof a modification to another component? Has theinspection method been made more sensitive or thedefinition of failure redefined? Only when this issuehas been understood or at least modelled empirically is

it worth looking more closely at the other more minoreffects.

Using ACH plots in this way to look at therelationship between failures and variables other thanage may lead the analyst towards a multivariatefailure model such as accelerated life or proportionalhazards [1]. Whether this is possible or useful dependson the quantity and quality of available data. The ACHplot may help indicate the form of such a model.Conversely, ACH plotting can be used after such amodel has been fitted. Example 4 illustrates this.

Lakey and Rigdon [3] describe an application ofWeibull regression on data arising from a clutch springexperiment of Taguchi and Wu [4]. This was based ona 1/27th fraction of a full 36 factorial. Three clutchsprings were made according to each of 27 differentdesigns, and these 81 springs were tested either tofailure or to 110,000 compressions, whichever camefirst. The results are given in Table3. The 11+response signifies a suspension.



Table 2. Data for Example 2 (interval data)

Status at Status at Status atLast last Previous Last last Previous Last last Previous

inspection inspection inspection inspection inspection inspection inspection inspection inspection

422 s 550 r 909 r 514495 r 550 r 926 r 501499 r 550 r 996 r 550499 r 550 r 1030 s 500502 r 550 s 1039 s 500523 r 550 s 1041 s 500534 r 648 r 1046 s 500534 r 649 r 1047 r 498537 r 668 r 1050 r 715539 r 705 r 1050 r 503541 r 715 s 500 1051 r 502541 r 719 r 1059 s 500542 r 746 s 500 1085 s 500545 s 749 r 500 1090 r 500549 r 769 r 1094 r 546549 r 791 s 500 1100 r 550549 r 798 r 550 1100 s 500549 r 799 s 500 1214 r 1010549 s 801 r 1271 s 500549 s 833 r 1290 s 500550 r 908 s 500

Figure 6. ACH plot for Example 2 (interval data)

Figure 7. Weibull plot for Example 3

Figure 8. ACH plot for Example 3 (against age)

Figure 9. ACH plot for Example 3 (against serial number)


216 D. NEVELL

Figure 10. ACH plot for Example 3 (against date)

A model of the form

f (t) = (β/η(x))(t/η(x))β−1 exp[−(t/η(x))β ]where

η(x) = exp(α0 + α1D + α2E + α3A

+ α4BC + α5F + α6G)

was fitted to the data.A to G represent the six designvariables listed in the table (BC is a combination oftwo factors). The parametersβ and α0 to α6 wereestimated using MLE, giving the following values:

β̂ = 1.408;α̂0 = 2.51;α̂1 = 0.940;α̂2 = −0.518;α̂3 = −1.35(for A = −1),−0.67(for A = 0),

0(for A = 1);α̂4 = 0.328(for BC = −1), 0.818(for BC = 0),

0(for BC = 1);α̂5 = 0.937;α̂6 = 0.216.

α̂3 and α̂4 have three discrete values becauseA

and BC are discrete variables.D, E, F and G arecontinuous. Just how good is this model? In order tohelp answer this question, ACH plots can be producedfor the entire range of explanatory variables (A to G)as well as for age. The calculation of ACH involvesusing whatever scale factor is appropriate for each ofthe 81 trials. Since the fitting method is MLE and 49failures were observed, we find that the total ACHvalue is also 49.

The ACH plot (Figure 11) shows a strikingmismatch between the two graphs. The failures seem

Table 3. Data for Example 4

D E A BC F G Responses

−1 −1 −1 −1 −1 −1 1 1 1−1 −1 0 0 0 0 4 5 11+−1 −1 1 1 1 1 2 2 11+−1 0 −1 −1 0 1 2 3 3−1 0 0 0 1 −1 5 11+ 11+−1 0 1 1 −1 0 1 1 1−1 1 −1 −1 1 0 1 1 3−1 1 0 0 −1 1 1 1 2−1 1 1 1 0 −1 3 3 4

0 −1 −1 0 −1 −1 1 1 20 −1 0 1 0 0 11+ 11+ 11+0 −1 1 −1 1 1 6 11+ 11+0 0 −1 0 0 1 11+ 11+ 11+0 0 0 1 1 −1 2 2 20 0 1 −1 −1 0 1 2 20 1 −1 0 1 0 2 3 40 1 0 1 −1 1 2 2 20 1 1 −1 0 −1 11+ 11+ 11+1 −1 −1 1 −1 −1 3 4 41 −1 0 −1 0 0 11+ 11+ 11+1 −1 1 0 1 1 11+ 11+ 11+1 0 −1 1 0 1 11+ 11+ 11+1 0 0 −1 1 −1 11+ 11+ 11+1 0 1 0 −1 0 5 11+ 11+1 1 −1 1 1 0 4 4 61 1 0 −1 −1 1 2 2 31 1 1 0 0 −1 11+ 11+ 11+

to be occurring much earlier than the model predicts,and it is noticeable that there is no failure greaterthan 6, even though there are many suspensions at 11.This once again suggests the presence of mixtures ofdifferent-quality units, even within the same design,and probably means that whatever combination oflevels is eventually chosen, the model will be a poor

Figure 11. ACH plot for Example 4 (against age)



Figure 12. ACH plots for Example 4 (againstA to G)

predictor. More striking is the mismatch shown onthe ACH plot for factorF (Figure12). The 27 trialsusing designs with factorF at level 0 yielded justeight failures. The corresponding ACH value is 27.4.This mismatch indicates that the effect of changingF is not adequately captured by a linear term in themodel. Lakey and Rigdon’s paper concluded that thebest setting forF would be the highest (1). In fact, themiddle setting (0) appears to be the best. The ACHplot shows this clearly and will be able to identifysimilar problems even if the data are unstructured (i.e.not from an orthogonal trial).

GOODNESS-OF-FIT

The four examples quoted all involve makingjudgements about the goodness-of-fit of models, buthave not indicated how statistical significance ofdeviations between the ACH and failure graphs shouldbe evaluated. We have not put great emphasis onstatistical rigour, for two main reasons. Firstly, thereare often too few data for confidence limits based onthe data alone to have any use. Secondly, ACH plotsare seen as a means to the generation of hypotheses,of possible relationships and of alternative models. Assuch, the acceptance of a model lies just as much inthe engineering arena as in that of the statistician. Forexample, treating the data purely as isolated numbersmight give a weak suggestion of an effect. Coupledwith prior knowledge that there is a good physicalreason why such an effect is sometimes present makesthe significance of such data much stronger. These

arguments imply that the use of traditional testsof significance and confidence limits is unwise, asengineering confidence can be quite different fromstatistical confidence.

However, one simple rule-of-thumb test is to utilizethe binomial distribution in conjunction with theMLE-based ACH plot. The MLE plot is used becausewe want to utilize the property that theK(t) andG(t) graphs converge at the upper end of the plot.Referring to the calendar-based ACH plot (Figure10),the argument proceeds in the following way. Sixfailures out of seven were seen within a time periodthat the fitted model implies only 2.5 failures shouldhave been seen. The probability of getting such aconcentration of failures, given that the fitted modelis correct, can be expressed as 7p6(1 − p) + p7,wherep = 2.5/7. This answer is appropriate if weare interested in testing for such a concentration overa specificx-axis range. The more general question,namely what is the probability of observing six ormore failures so close together overanyx-axis range,is more difficult to address. It is suggested that MonteCarlo simulation be used to assess the answer to thisgeneral question.

In the case of Figure10 the specific probabilityof getting six failures out of seven in the periodbetween March 1997 and October 1997 is about 0.01.The answer to the general question about a similarconcentration over any time period is greater, about0.04. This rule-of-thumb test would indicate fairlystrongly that the bias in calendar date distribution isnot due to random error.


218 D. NEVELL

MIXTURE MODELS

We have seen that the proposed range of ACH plotsgives a way of detecting non-homogeneity in data,using one basic format of plot. Since the emphasis ison data visualization, not on parameter fitting, we canuse it in conjunction with any model-fitting method.MLE is particularly useful in providing the basis forinformally testing the significance of plot deviations.In our experience the most common manifestation ofnon-homogeneity is via the ACH age plot. Mismatchon this plot often shows that there are too manyhigh-life suspensions relative to the failures. High-lifesuspensions are powerful pieces of information andgive a clear indication that the wrong suspension setis being used with the failure set. The implication isthat some proportion of the population is ‘at risk’ andthe remainder is ‘immune’. However, we often have noway of making the subdivision based on any knownfactor. What can be done is to make the assumptionthat a fixed proportionp of units entering service atany time belongs to the ‘at risk’ set. It remains then tofit a three-parameter mixture model based onη, β andp. If the failure distribution for the ‘at risk’ proportionis denoted byFA(t), then

distribution function for the mixed population

= pFA(t) (1)

likelihood(suspension att) ∝ 1 − pFA(t) (2)

likelihood(failure att) ∝ pfA(t) (3)

P(suspension is in ‘at risk’ proportion)

= p(1 − FA(t))/[p(1 − FA(t)) + (1 − p)](4)

P(suspension is in ‘immune’ proportion)

= (1 − p)/[p(1 − FA(t)) + (1 − p)]P(failure is in ‘at risk’ proportion)= 1

P(failure is in ‘immune’ proportion)= 0

What justification is there for this kind of discrete‘at risk’ behaviour? Our own empirical evidenceshows that this is probably the most commonmodel type encountered, so it is natural to look forexplanatory physical reasons. Here are some possiblescenarios. Firstly, the discrete boundary could bedefined by a critical dimension either falling belowor exceeding a particular value. For example, theunits which are oversized (representing the tail ofnormal production) cause interference with anothercomponent, precipitating failure. The smaller unitsnever fail in that mode. Secondly, the failure potentialof a unit may be determined on the basis of whetheror not there is an inclusion, flaw or defect in its

Figure 13. ACH plot for ‘at risk’ units (Example 1)

parent material lying within a critical high-stress zone.Given that these failure-inducing defects could berandomly distributed throughout the material, therewill be a discrete yes/no boundary driving the ‘atrisk’/‘immune’ boundary of failure behaviour. Thirdly,the quality of a bond may be instrumental in causingsome proportion to fail, since there is a criticalstress limit below which cracks are able to propagate.Additionally, there may be discrete variations in theusage of components which are not captured by thedata, e.g. different flight mission profiles.

It must be remembered that those units which areclassified as immune are only free from risk of failingby the mode under analysis. It may well be that anotheras yet unidentified failure mode is imminent. Usually,since failures are acted upon swiftly, we will only havefailures from one population. However, if higher-lifefailures start to occur, it may be necessary to extend thefailure model to the more conventional mixture modelpFA(t) + (1 − p)FB(t).

Returning to Example 1, it is clear from Figure4that the three-parameter mixture model (equation (1))should give a better fit. Using MLE (equations (2) and(3)) to estimate the parameters suggests that only 8.3%of the units appear to be at risk, and that their failuredistribution has a slope of 7.38 and a scale of 1147.The resulting ACH plot for the ‘at risk’ proportion ismuch improved (Figure13). Equation (4) can be usedto show that only 6.76 of the remaining suspensionsare thought to be in the ‘at risk’ proportion. This givesa very different projection of future risk than wouldbe obtained from treating the data as homogeneous.For example, if all the suspensions are projectedforwards 450 h, then the original model predictsanother 55 failures. The mixture model, starting with6.76 suspensions, predicts less than five failures. In thelatter case the estimated hazard rate for an unfailed



unit increases to a maximum and then reduces again asit becomes increasingly obvious that the unit belongsto the ‘immune’ proportion. In such a situation, withno clear way of identifying in advance whether a unitis in the ‘at risk’ proportion, it is difficult to applya cost-effective lifing policy, as it is inevitable thatmany good units could be removed unnecessarily. Forsafety issues this is accepted as being necessary, but isnot desirable for general maintenance. Scenarios suchas these emphasize the importance of using Weibullanalysis to understand why there is a problem in thefirst place, as well as quantifying the risk.

Using the three-parameter mixture model (equa-tion (1)) generates non-integer numbers of suspen-sions, since they are weighted by the probability ofbeing in either of the two subpopulations. This isnot a problem for ACH plots, since the ACH val-ues can also be appropriately weighted. MLE canallow for non-integer suspension weightings, whichis something that MRR can also do, although moststandard software packages will probably not allowit. If the more complex five-parameter mixture modelis used(pFA(t) + (1 − p)FB(t), whereFB(t) is thedistribution function for failures in the remaining 1−p

proportion), then both suspensions and failures will begiven non-integer weightings. Once again, ACH plots(and MLE) will handle this, although probability plotswill not. Figure13is an example of where non-integersuspensions have been used.

CONCLUSIONS

For the analyst faced with the problems of sparseor poorly defined failure data against a backgroundof high technology and a requirement for a veryhigh degree of safety and reliability, there is aneed for a robust method for visualizing the datafor both interpretation and communication. Thestandard Weibull plot has some shortfalls relatingto its scales, non-portrayal of suspensions, intervaltype data and the effects of variables other than age.

The ACH plot complements the Weibull plot andeffectively overcomes these problems. It is useful forall data types, any fitting method, complete or non-complete samples, for data exploration or more post-model evaluation. It even has scope for parameterestimation (using the goodness-of-fit between thetwo graphs), but is most usefully employed as avalidation tool. Use of ACH plots often emphasizesthe need for appropriate simple mixture models. Thethree-parameter mixture model (‘at risk’/‘immune’) ispossibly the most common data scenario encounteredand leads to quite different conclusions from a two-parameter model.

In practice, ACH plots have added considerableunderstanding and communicative power to theprocess of turning data into useful information, bothwith a view to eliminating the root causes of failure aswell as estimating future risk, and their use alongsidestandard probability plots is recommended.

REFERENCES

1. Nelson W.Applied Life Data Analysis. Wiley: New York, 1982.2. Abernethy RB, Breneman JE, Medlin CH, Reinman GL.

Weibull analysis handbook.US Air Force AFWAL-TR-83-2079,1983.

3. Lakey MJ, Rigdon SE. Reliability improvement usingexperiment design.In Annual Quality Congress Transactions.ASQC: Boston, MA, 1993; 824.

4. Taguchi G, Wu Y.System of Experimental Design, vol. 2.American Supplier Institute: Dearborn, MI, 1987.

Author’s biography:

D. A. Nevell has a BA from Queen’s College, Oxford(mathematics) and an MSc from Lancaster University(operational research). Between 1983 and 1996 he workedfor the OR group at Rolls-Royce, Derby, before taking overas Head of OR at EDS’s Rolls-Royce account. Since 1999 hehas operated as an independent consultant. He has worked ona wide range of projects in manufacturing and engineeringwith particular emphasis in the development and applicationof practical tools in the fields of statistics, reliability andquality.


using aggregated cumulative hazard plots to visualize failure data

Documents