the mathematics of success and failure (spc)

The Mathematics of

Success and Failure

tatistical process control (SPC) i s an un- familiar term to many electrical engineers, but it has become vital to the future of integrated- circuit manufacturing. This collection of statis-

tical techniques is particularly important for integrated circuits, where the pace of technological change, and quality improvement in microelectronics, is unprecedented (Fig. 1).

T h e p u r p o s e o f S P C i s q u a l i t y improvement, but some people have adopted other points of view. A popular one has been that things are going well so long as parts remain within specifications. A favorite story of Edward Deming (the American most responsible for Japan’s industrial success) concerns a company whose plants in American and Japan built automobile transmissions to the same specifications [ I ] . Transmissions made at the American plant met specifications, but generated customer problems and enor- mous warranty costs. Transmissions made at the Japanese plant did not generate customer problems; quality had been im- p r o v e d t o t h e po in t where t h e transmissions far exceeded specifications. (This emphasis on reducing variation in manufactured products to improve quality is at the heart of the Taguchi approach to SPC.)

An issue closely related to SPC is yield.

Statistical process control can dramatically increase the chances of success

in microelectronics manufacturing-or sadly document a history of failure

by Kenneth Rose

In IC manufacturing, yield is the fraction of manufactured chips that meet specifications. Higher yields and higher quality require process control of four factors: operations, particulates, contamination, and parameters. SPC is often associated with normally dis- tributed variations in continuously variable parameters such as linewidth and sheetresis- tance. But operational failures, such as drop- ping of wafers, are as effective in reducing yield as parameter variations.

Particulates can cause catastrophic failure in integrated circuits. As circuit sizes decrease, failures can be produced by smaller and smaller particles, and available evidence suggests that the number of particles increases with the inverse cube of the particle diameter. We can explain this dependence by considering a cube whose faces are bisected to form eight smallercubes; i.e., halving the cube’s edge length increases the number of cubes by z3.

Integrating this inverse-cube particle distribution yields the number of particles greater than some minimum measured diameter, which is an inverse-square relationship. (This agrees closely with the dependence of particle density on particle

diameter used to define cleanroom classes in the federal FS209 standards.) So, as larger particles break up into smaller particles, we continue to have particulate problems.

For a cube law distribution, the probability of failure remains the same for a circuit as its dimensions are scaled. In practice, however, we don’t use reductions in feature size to make the same chips smaller. Rather, we exploit advances in fabrication technology to pack more functionality (and complexity) into each chip. As a result, chip sizes remain the same or increase, and chip yields decrease dramatically. This argument implies that SPC techniques should be ex- tended to binomial distributions to account for ca tas t rophic fa i lures as well as parametric variations.

As we move into the ULSI era and device sizes shrink still more, we are increasingly concerned with device contamination, which can affect parameter variation at a local level. Professor Tadahiro Ohmi of Tohoku University illustrates the serious- ness of the problem by considering a single bacterium. One bacterium contains 9 x IO7 phosphorous atoms in a rod 2 pm long and 1 pm in diameter. Simple diffusion from a

26 8755-3996/9 1/$1.00-1991 IEEE Circuits & Devices

Hcwsepower (Industrial Revolution) I , Bits (Computer Revolution)

I

300 01

G A T E 2750.

i 10K-

1 K - *

l o o - .

I T

+ ’10’ Complter

Revolution

- 1M

- IOOK

. 10K

-

1800 I

1980 1990

1850 1900

1. The pace of technological change in both the microelectronics revolution (characterized by the number of bits in a DRAM) and the industrial revolution (horsepower developed by a steam engine) rose explosively and exponentially with time. The latest revolution, however, has been faster. We have experienced as much technological change in ten years of the microelectronics revolution as in a hundred years of the industrial revolution.

1 pm in diameter. Simple diffusion from a single bacterium lodged near a MOSFET can cause phosphorous concentrations greater than 1 0 ’ ’ c 1 ~ - ~ within a micron, suf- ficient to cause significant shifts in the voltage threshold of a channel region.

Tools and Techniques for SPC Traditionally. SPC has been used to identify processes that are out of control. But a more important use is to reduce process sensitivity i n order to improve product quality. Greater process tolerance allows tighter component specifications, which generally help a desig- ner avoid problems when the components are integrated to form a system.

A s SPC is typically practiced. a manufacturing engineer takes a random sample of n chips from a batch of N chips, and determines the number of chips that exceed product tolerances. These chips are called “defectives.” How are these toleran- c e s se t and . assuming that product parameters vary randomly according to a normal distribution, what fraction of product would we expect to fall outside the tolerances?

To answer this question. suppose that the product parameter of interest- threshold

25004 iL

C K N E s 2250,

DATE

2. A Shewhart confro/ chart is a traditional device for indicating when a process is in, or out of, control. Such a chart is compiled by plotting the average and standard deviation of sample lots as a function of lot number, date or any specified variable.

voltage. for example- is normally distrib- uted about some mean (p), say. I .0 V: and the standard deviation, 0, is 100 mV. For a normal distribution, 95.5 percent of all samples will lie within +20 of the mean: 99.7 percent will lie within +30. If our circuit can only tolerate threshold voltage variations from 0.8 to I .2 V, almost 5 percent of the circuits would be defective.

Traditionally, process variations are controlled during process development such t h a t 9 5 . 5 pe rcen t o f p roduc t samples lie within product specification tolerances. When actual manufacturing starts, however, the variations are controlled more tightly. so that 99.7 percent of product samples meet specifications. I n short, processes “are developed to 20 , and controlled to 30.” This means we must reduce the standard deviation of our product so that 30 = 200 mV ((3 = 67 mV). Within the last few years. there has been discussion of six-sigma processes. which correspond to only 3.4 faults in a million samples. In our example. this would correspond to a IS = 33 mV. which is within the range of present process var i a ti o n s.

In SPC, we tend to believe that the order

in which samples are taken will not matter. This implies that the samples are statisticn//x iridrprizrlrrir so that the deviations of the samples from the process mean, called “errors” by statisticians, are not correlated. This is the rnrzdom scrmplir7g hypothe.sis, which is often assumed in sampling procedures.

Suppose this hypothesis is true, and that we take a sample of n items from a population of mean value E(x) = p, standard deviation p, and variance 0’ = V(x) = E((x-p)’). Note that the expected value E(y) is defined as the limit of (LNyi)/N as N approaches infinity.) The expected average value of n-item samples, X will vary about a mean of p with a variance V(T) = 02/n . Thus, the means of larger samples are more likely to be close to the mean of the population. The distribution of sample means. X , becomes normal with increasing n. As a result, statistical procedures which rely directly on the distribution of T. are said to be robust; i.e., insensitive to departures of the parent distribution from normality.

The distributions of random samples drawn from a normal population are well known. The distribution o f s , is normal; the distribution of s’. the variance ofthe sample,

November 1991 27

is chi-square. The statistic 6- p)/(st’’*) has the t distribution with n-I degrees of free- dom. This last result is particularly valuable because it allows us to estimate the deviation 6 - p) of our sample average from the true population mean solely from the sample properties s and n.

In S P C , o n e typical ly compares samples prepared under different procedures to determine whether the difference in procedures has any statistically significant effect. There are two reasons for this. First, an engineer who is monitoring a process must decide if an observed

is raised to 20 percent if we consider only two ten-item samples and assume random sampling from a normal population. Use of Wilcoxon’s distribution-free, or non- parametric, test raises the significance level to 24 percent. (Statisticians often accept a significance level of 5 percent as a reasonab- ly safe boundary, below which they assume that differences are not the result of chance.) In summary, the more you know about the population, the smaller are the differences you can safely recognize as not being the result of chance.

A Shewhart control chart as shown is

SPC is traditionally used to identify processes

that are out of ccmtrol. But a more important use

is to reduce process sensitivity in order to improve

product quality

deviation means that the process has changed for the worse and needs to be cor- rected. Second, the engineer may wish to find a better process. In either case, our hypothetical engineer can draw the strongest inferences if he or she is taking random samples from a normal population.

In Chapter 3 of their textbook, Box, Hunter, and Hunter (BHH) discuss in detail the significance of samples taken from a production run under different as- sumptions 121. At a significance level of 5 percent, the chances are 5 in 100 that the observed difference between two samples is the result of chance. A difference corresponding to a significance level of 20 percent makes it four times as likely that the observed difference is a chance result. BHH show that knowledge of ten previous samples, together with the rather modest assumption that the differences in their means have an approximately normal distribution, can reduce the significance level below 3 percent. On the other hand, the significance level of the same difference

often used for process control (Fig. 2). Such a chart is compiled by taking samples of lots and plotting the sample average and standard deviation as a function of lot number or, as in the figure, date. The sample range, r, is often plotted instead of s, since the former is obtained more easily. Because ~=7/n’.~, 7 is a for a normal distribution, we can use the equation

to set control limits for the mean. Typically, M = 2 is used for warning limits, and M = 3 is used for take-action limits. This is consis- tent with a “develop-to-20, manufacture-to- 30” philosophy.

Another control chart in common use is the Cumulative Sum or CuSum chart. These charts are formed by plotting the cumulative sum of sample averages

ci [ xi - E($ j

The CuSum chart corresponding to the control chart in Fig. 2 is shown in Fig. 3. In the long run, we expect the CuSum to vary randomly about zero. Persistent upward or downward movement of the CuSum indi- cates that the mean is changing as the process moves out of control. CuSum charts are particularly useful for detecting process drifts.

Ringing the Alarm How do we decide when a change in an indicator such as CuSum is statistically significant? It depends on several factors, and requires that decisions be made in advance concerning 1) when a deviation 6 from the mean requires action, and 2) the level of risk that the enterprise is willing to take in making a wrong decision. Statisticians define two types of wrong decisions or errors. Type 1 errors, of probability a, occur when the control engineer decides that the deviation is significant when, in fact, it is not. Type 2 errors, of probability 13, occur when the engineer decides the deviation is not significant when, in fact, it is. The control engineer must decide on appropriate values for a and R. If adjusting the process involves considerable cost, for example, he or she will want to be very careful to avoid Type 1 errors and will choose a to be much lower than 13. Typical values might be a = 0.05 and 13 = 0.1.

Once these decisions are made, it is possible to decide objectively whether or not a significant change has occurred, by plac- ing a “V m a s k a distance, d, beyond the last CuSum point (Fig. 3). If any of the CuSum points cross the lines of the V mask, a significant change has occurred and action should be taken. The line slope, tan 0, and the lead distance, d, depend on 0, A, a, and 13. The V masks shown in the figure correspond to a shift A in the process mean of 8 A for a process with a = 0.05, R = 0.1, and a standard deviation for mean values of o = 2 A. Significant trends can be detected at the fourth and fifth samples, assuming that the process is sufficiently well characterized for a reasonable estimate of o to be made. From the turning point of the CuSum plot, one can estimate the shift in mean value at the tum- ing point [3].

The sample average, X, is only one of the statistics that must be controlled. Another commonly used control statistic is the number of defective items, m, in a fixed sample of size N. This statistic is appropriate for a

28 Circuits & Devices

20

10-

0

Product Parameti

0 5 10 15 20 25 30

Weeks

Process Sensitivity

A B

rt 2.3 +30

3. A CuSum control chart for the data of Fig. 3. If any of the CuSum points cross the lines of a V mask, a significant change has occurred. (Two V masks are shown superimposed on the data.)

4. Product quality depends upon process sensitivity. In this case, two manufacturing processes correspond to stronger or weaker dependencies on a product parameter, threshold voltage, with respect to temperature, a process parameter.

binomial distribution where the 3 0 control limits are

N e f 3+i@i37

and 8 is the probability an item is defective. Binomial distributions are appropriate when one has pass/fail conditions rather than parametric concerns, which often occurs in microelectronics manufacturing. Two ex- amples are test capacitors that may be shorted by pinholes, and conducting paths that may be open or shorted.

New Strategies for SPC Professor Ishikawa of the University of Tokyo has suggested that quality control evolved in three stages: inspection, manufacturing-process controls, and design improvement. Inspection may prevent bad products from leaving the factory, but it does not reduce product variation. Manu- facturing-process controls can reduce product variation, but process corrections are often costly. Design improvements usually offer the greatest leverage in reducing variation and improving product quality.

Success in process control depends critically on process sensitivity, a concept that can be explained with a simple example (Fig. 4). Here, two sets of process conditions, A and B, correspond to stronger and weaker dependencies of a product parameter (threshold voltage, for example) on some process parameter (such as temperature of gate oxide growth). The range of allowable product-parameter variation is shown on the product-parameter axis. For threshold volt-

age, this range is set by circuit specifications.

On the process-parameter axis, we see the k20 and f 3 0 ranges within which the process parameter can be controlled. Tem- perature, for example, would depend on heater and controller characteristics. For process condition A, we expect only 95.5 percent of the products to meet specification, while for process condition B more than 99.7 percent of the products will meet specifications! Case B, where the process sensitivity is lower, will produce more product within specifications. Moreover, process condition B will allow us to improve product quality by reducing product tolerances. If the product parameter is threshold voltage, this will result in improved circuit reliability.

Popular parameters used to characterize the relationship of parameter specifications to process variations are the process-capability ratios c, and Cpk.

where USL and LSL are the upper and lower specification limits, respectively. For a process to meet the 60 criterion, USL - LSL > 120, or C, > 2.0. Processes meeting the older manufacture-to-30 standard would have C, = 1 .O. The definition of C, assumes that process variation is centered within the specification limits, but variations are often not centered. When they are not, it is more appropriate to use the alternate process- capability ratio

C,k = min(Cpu,Cpr ) where CPU = (USL - p)/30, and Cpi = (k - LSL)/30. Lack of centering makes it more difficult to achieve high process-capability ratios. Chapter 9 of Montgomery’s text treats process-capability measures in detail

This approach to statistical process control, in which one attempts to reduce process sensitivity to external variations rather than controlling these variations more tightly, has been especially popular in Japanese manufacturing. In 1953, a Japanese ceramic tile manufacturer, the Ina Tile Company, realized that an uneven distribution of temperature in the kiln caused tile sizes to vary. A conventional approach to process control would have attempted to increase temperature uniformity, at considerable increase in manufacturing cost. To reduce size variation without increasing cost, they used a statistical experimental design to find a tile formulation that reduced the effects of temperature variations. In particular, they found that increasing the lime content of the tile formulation from 1 percent to 5 percent reduced size variation by a factor of ten [4].

This illustration, with tiles replaced by chips, is highly relevant to the microelectronics industry. Following Japanese semi- conduc to r - indus t ry p rac t i ces , the manufacturer would probably employ both strategies by using statistically designed experiments to obtain minimum variation from existing equipment, and then making sure that temperature uniformity was im-

~31.

November 1991 29

proved in the next generation of equipment. This would allow the manufacturer toreduce variations still further, improve product quality, and be prepared for the next product generation.

Taguchi formulated an approach to improving product quality at the design stage. (Ref. 4 provides an excellent introduc- tion to Taguchi 's approach.) Once a prototype design is developed, parameter design identifies process settings that reduce product variations. Once these settings are

orthogonal arrays, unlike response-surface methods, allow no estimate of interactions between factors. However. orthogonal arrays are an efficient way of screening out less important factors. which is particularly important in the early stages of an investiga- tion.

What is important in practice is that a statistical analysis of process sensitivity can screen process tolerances to achieve high parametric yield. An excellent example of this approach. published by a group at GE

The unprecedented rate of improvement

in microelectronics technology requires very

rapid improvement in process quality

determined, tolerance design determines tolerances that reduce losses over the life of a product. An important concept here is the expected loss, i.e., the lifetime cost to the user of poor product performance. Taguchi generally takes expected loss to be a quad- ratic function of performance variation.

Parameter design classifies all variables affecting product performance as either design parameters or noise sources. Design parameters can be set by the process de- signer. Noise sources may be external (operating temperature), or internal (uncon- trolled parameter variations). Parameter- design experiments attempt to maximize a kind of signal-to-noise ratio, typically the ratio of mean value to standard deviation. Since there are usually many (N) variables to be considered, statistical experiment designs are necessary to reduce the number of test runs (2N) required to evaluate the influence of all the variables when each of them has only two possible values. Taguchi favors the use of orthogonal arrays in experimental designs. T h i s has caused controver- sy in the statistical profession because

Corporate Research and Development, looks at threshold voltage. VT, in a MOS- FET [ 5 I . The VT depends on many process variables, including substrate resistivity, gate and screen oxide thicknesses, deep implant energy and dose. and threshold implant energy and dose. The assumption of a linear statistical model. along with appropriate turnings of the mathematical crank. produces a formula showing that variance i n VT depends not only on the variances of the various parameters, but also on the sensitivity of VT to variations in those parameters. (Estimates of these sensitivities can be obtained from device models. parametric tests. or direct experiments.) The MOSFETs fabricated by the GE group were made in a 1 .T-micron CMOS process. Gate oxide thickness and threshold implant dose were the largest contributors to VT variation for NFETs. For PFETs in the same process, punchthrough implant energy, gate oxide thickness, and threshold implant dose were the largest contributors, which led to a significantly higher standard deviation. Al- though ni ore so phis t i c a ted statistical

approaches a re ava i lab le 16-81. this straightforward but significant analysis of process sensitivity isolate$ the factors need- ing particular attention if product quality is to be iniproved. However, more sophisti- cated analyses are likely to become increasingly useful as the windows for process adjustment shrink and distort 18 I.

The Road Ahead Continuous quality improvements are not optional if one wishes to remain competi- tive in microelectronics manufacturing. The unprecedented rate of improvement in microelectronics technology requires very rapid improvement in process quality. Knowing the techniques of statistical process control is essential for maintaining and improving process quality. This is especially true at the stage of product and process development where SPC tech- n iques o f f e r the grea tes t l everage . Continuing reductions in the sensitivities of product parameters to process variations will assure continuous quality improvement. In addition. defect densities must be reduced significantly as design rules shrink to avoid catastrophic failures and assure reasonable yields. CD

Kemzetl? Rose ISM] is a Professor of Elec- trical Engineering and a member of the Cen- ter for Integrated Electronics. Rensselaer Polytechnic Institute. Troy, New York.

References I . L . Dobyns. "Ed Deining wants big changes and he wants them fast." Siui th.souicrr~. 21. pp. 7 4 x 3 . August 1990.

2. G.E.P. Box. W.G. Hunter. and J.S. Hunter. "Statijtics for Experimenlers" (John Wiley. 1978). 3 . D.C. Montgomery. Iiitrotlrrc.tior7 t o Sttr- ti .stic,cr/ Qrrnlit! Control (2nd Ed.. John Wiley. 1991).

3. R . N . Kackar, "Off-line quality control. Parameter de\ign. and the Taguchi method," J . Q i ro l i t~ T ~ ~ h r 1 0 l 0 , y ~ ~ . 17. pp. 176-88. 1985. 5. M . Gherzo, "CMOS Process Architecture for Yield Enhancement.'' VLSl Yield Enhan- cement Short Courw. Rensselaer Polytechnic Institute. Augu\t 1990. 6. S. Sharifzadeh e t ol.. "Using simulators to model transmitted variability i n IC manufacturing." l E E E Trtrrr.,. 0 1 1 Semic~orldrrcror Mtrriifftrc.t irriri~. 2, pp. 87-93 (1989). 7. S.W. Director. W. Maly. andA.J. Strojwas.

Yield Ell-

hti i iceine~if (Kluwer. 1990). 8. A.J. Strojwas. "Statistical Process Control and

V L S l Dc.sigfl , for - Mtrrlr!firc~trrrifIg..

30 Circuits & Devices

the mathematics of success and failure (spc)

Documents