wafer probe test cost reduction of an rf/a device by...

Wafer Probe Test Cost Reduction of an RF/ADevice by Automatic Testset Minimization

-A Case Study-Dragoljub (Gagi) Drmanac

Li-C. WangUniversity of California, Santa Barbara

Michael LaisneQualcomm, Inc.

Abstract—This work presents a case study of wafer probe testcost reduction by multivariate parametric testset optimization fora production RF/A device. More than 1.5 million tested devicesamples across dozens lots and hundreds of parametric measure-ments are analyzed using a new automatic testset minimizationsystem. Parametric test subsets are found that can be used topredict infrequent wafer probe failures. Multivariate test modelsare generated to justify removal of ineffective tests, screen failureswith minimal test cost, and demonstrate that some frequentlypassing tests are safe drop candidates. The proposed method isevaluated using parametric test data from an RF/A IC currentlyin production, showing how to reduce test cost and uncovertradeoffs between test escapes and overkills during high volumewafer probe screening.

I. INTRODUCTION

As global demand for mobile devices grows, so does theneed for high quality and low cost volume testing of RF andAnalog (RF/A) dies. From past manufacturing experience, it isknown that wafer probe testing leads to early defect screening,thus reducing packaging, burn-in, and final test costs [1][2][3].Current state of the art RF/A wafer probe testing applieshundreds of parametric tests to screen the vast majority ofdefects early in the production flow, in an effort to reducesubsequent manufacturing and test costs.

Parametric wafer probe testing faces many challenges asRF/A devices shrink and target low-cost high-volume con-sumer products. Testing of such devices is difficult and ex-pensive due to measurement noise, large volumes of tests,expensive probe cards, tester to tester variability, and difficultysetting parametric test limits [4]. To reduce test time, andtherefore test cost, automatic test equipment (ATE) frequentlyapplies multi-site testing to probe 4–16 RF/A dies simultane-ously, resulting in more throughput and consequently enablescontinue-on-fail testing.

While parallel testing is effective at increasing throughput,it does not tackle the key problem of test effectiveness. Toillustrate the problem, Figure 1 shows a Pareto plot of allparametric test failures across dozens of production lots from acurrent 65nm RF/A device. The left axis shows the number offails for each soft bin and the right axis shows the total percent

This work is supported in part by National Science Foundation, Grant No. 0915259and Semiconductor Research Corporation, Project 2010-TJ-2093.

Fig. 1. Pareto of Parametric Test Failures

of fails captured cumulatively. This figure demonstrates that10% of tests capture 90% failures, while the remaining 90%of tests capture only 10% of failures. In other words, 90% oftest effort only captures 10% of failures. This is an extremeinstance of the famous Pareto principle or 80:20 rule, where80% effects can be attributed to 20% of the causes. We willcall the bottom 10% of failures the tail failures as indicatedin Figure 1.

Looking at Figure 1, one may ask: “Since we know howto screen 90% of failures quite well, what would it take tocapture the remaining 10% with minimal test cost?” Namely,what subset of tests that currently screen the bottom 10%of tail failures can be used to distinguish between passingdies and failing dies? The answer to this question dependson optimizing test effectiveness when screening tail failures.We will show that this question can be answered using largevolumes of parametric test data and multivariate test methods.

In recent years, multivariate testing, sometimes called adap-tive or statistical testing, has been used to reduce test costand improve test quality [1][5][6]. Adaptive test is a generalterm used to describe “methods that change test conditions,test flow, test content, and test limits based on manufacturingtest data and statistical data analysis” [7]. Continue-on-failparametric tests are particularly well suited to multivariatetesting because they provide a complete set of real valuedmeasurements that can be combined and compared in manyways to make a final pass/fail test decision. Returning to the

Paper 6.2978-1-4577-0152-8/11/$26.00 c©2011 IEEE

INTERNATIONAL TEST CONFERENCE 1

problem in Figure 1, one may also ask: “What type of mul-tivariate or statistical test mechanism is capable of screeningthe bottom 10% of tail failures while keeping overkills to aminimum using only a subset of tests?”

In this work, we present a case study to answer the abovequestions and introduce three multivariate test methods forreducing test cost. The case study is performed on parametricwafer probe test measurements from over 1.5 million 65nmRF/A devices across several dozen production lots. Multivari-ate test models are developed based on the first 80% of lotsand their performance is evaluated on the subsequent 20%.We propose an automatic testset minimization (ATM) strategywhich improves on parametric testset optimization introducedby [8]. ATM is used to select the smallest subset of tests thatcan screen the bottom 10% of tail failures using a multivariatetest perspective, thereby reducing test cost. The optimizedsubset of tests is matched to each of the three following testcost reduction strategies and their effectiveness is evaluated.

• Test Dropping Validation: Justify dropping a single testti by finding a multivariate test model that can predictall ti’s failures using m other tests t1 − tm. This is aconservative test dropping method with up to 100

m % costsavings for products with high quality requirements.

• Single Multivariate Test Model: Drop 10%-50% of testsby finding a single multivariate test model to screen alltail failures. This is an aggressive test dropping methodfor low cost products.

• Multi-Bin Multi-Model: Generate multivariate test mod-els based on subsets of tests for each infrequently failingbin, to reduce test cost by 15%-30%. Offers moderate testdropping for consumer products.

The rest of the paper is organized as follows: Section IIdiscusses related works in test cost reduction and multivariatetesting. Section III describes the proposed automatic testsetminimization approach. Section IV explains three multivariatetest cost reduction strategies and their intended uses. SectionV describes the case study setup. Section VI discusses theperformance of developed multivariate test models and SectionVII concludes.

II. RELATED WORKS

Several recent works have proposed multivariate, statistical,or adaptive test methods to minimize test cost and maintain testquality [1][4][7][9]. Multivariate parametric testset optimiza-tion was introduced in [8], where the authors found essentialparametric tests to help screen in-field or final test fails andset better test limits early in wafer probe testing. It was shownthat multivariate testing with an optimized subset of tests couldachieve high screening accuracy with minimal overkill.

Correlation based signature analysis is used in [1] to screendefective dies in wafer probe to reduce packaging and final testcost. A general cost model is derived to expose the tradeoffsbetween test escapes and yield loss while taking into accountexpensive packaging. Simulation based results are presentedfor a mixed signal SoC showing various tradeoffs for differentpackage cost, test escape, and yield loss requirements.

The challenge of testing a complex mobile phone SoC withdigital, RF, and mixed-signal components is discussed in [4].The authors showed that mixed signal testing is expensiveand may cost 34%, while RF may cost 22%, of the totaltest budget for a complex SoC. In addition, sensitive probecards for analog/RF measurements and testing of multipleheterogeneous blocks in one test insertion are key challenges.For complex SoC designs there may also be a lack of testeffectiveness since RF blocks are often custom designed, anddesigners have a high interest in thoroughly testing each block,resulting in longer test times.

Authors in [7] discuss an adaptive test flow which incor-porates die-level traceability, information sharing between teststeps, integrated test databases, and test adaptation algorithms.Adaptive testing is used to modify test intensity, test selection,and test limits at different insertions in the test flow to optimizecost versus DPPM levels. The authors highlight today’s lack ofeasy access to test results across different insertions, and theneed for analysis tools capable of handling millions of dies,yet flexible enough to target different market segments suchas consumer, automotive, mobile, or low cost.

A versatile cost saving test platform for RF ICs is proposedin [9] that uses low cost custom tester-on-board circuitry andmembrane probes. The authors show that RF test cost canbe substantial, particularly with the upfront cost of automatictest equipment (ATE) and sensitive probe cards. Using atester-on-board design, upfront cost saving was achieved andproduction test cost was reduced by multi-site testing of 4 diessimultaneously.

Other works include statistical outlier analysis used in [10]to detect burn-in fails based on parametric wafer probe data,where essential tests are selected by correlating to knownburn-in failures. In [11], parametric analog and RF testsetsare optimized based on a defect density fault model whereopen, short, and pinhole faults were simulated for analysis.Authors in [12] used several machine learning techniques todetect failing devices based on a subset of mostly functionaland some parametric tests to reduce cost by removing non-essential analog measurements.

III. AUTOMATIC TESTSET MINIMIZATION

Recent advances in machine learning, data mining, andfeature selection algorithms have enabled the creation ofnew test selection strategies powered by large volumes ofproduction test measurements. Multivariate parametric testsetoptimization was introduced in [8], allowing individual tests tobe ranked by their effectiveness when discriminating betweenpassing and failing dies in a high dimensional test space.

This paper proposes a new recursive test elimination strategycalled automatic testset minimization (ATM), where multiplesuccessions of testset optimization are applied to find thebest subset of tests for low cost multivariate wafer probescreening. We use the ν-SVC support vector machine (SVM)algorithm described in [13] and [14], implemented with amodified version of the open source LibSVM software packagedeveloped by [15].

Paper 6.2 INTERNATIONAL TEST CONFERENCE 2

Fig. 2. Test Correlation Matrix

A. Usage Considerations

To successfully apply testset optimization algorithms, sev-eral issues must be considered including the type of availabletest data, extent of measurement correlation, magnitude ofprocess variability, and distribution of failures. Asking thesequestions helps test engineers decide whether multivariate test-set optimization is applicable given their specific product andavailable test data. In this work, continue-on-fail parametricwafer probe data from a 65nm RF/A device was used. Thekey advantage of continue-on-fail parametric tests is that acomplete table of real valued measurements can be constructedto perform testset optimization.

Given a large volume of parametric test data, a good indica-tor of the ability to leverage multivariate methods and reducetest cost is the existence of correlation between some test mea-surements. Computing the pairwise correlation between testmeasurements can be performed quickly on a representativesubsample of available test data. Figure 2 shows a correlationmatrix of 765 parametric tests from tens of thousands of diessubsampled across many production lots. Red areas indicatelarge positive correlation, blue areas indicate large negativecorrelation, and cyan areas indicate zero correlation. Thisfigure shows that most tests with index below 100 are veryinter-correlated, these are voltage measurements. In addition,other tests are correlated across a wider range of measurementssuch as those with index 200 to 450, these are mostly Idd mea-surements. With the existence of some correlation between testmeasurements it is reasonable to pursue testset optimization.

Assuming testset optimization is justified by the existenceof some correlated tests, one still has to ensure that multi-variate test methods can discriminate between passing andfailing dies based on a subset of tests. A common way toinspect the dataset to ensure there is enough separabilitybetween passing and failing dies is through visualization byprinciple component analysis (PCA) [16]. In PCA, a linearcombination of passing test data is performed such that eachresulting principle component direction is uncorrelated, then

Fig. 3. Top 3 Principle Components

Fig. 4. Variance of Principle Components

the transform is applied to both passing and failing dies.Figure 3 shows a 3D plot of passing and failing dies in thetop three principle component directions explaining 31% ofnominal device variance. The red stars are failing and theblue dots are passing dies. A tight clustering of passing diesis observed while failing dies spread out behaving as outliers.This perspective shows that a multivariate test method shouldbe able to distinguish between passing and failing dies.

A useful consequence of PCA analysis, is that each resultingprinciple component is associated with a variance measure-ment which allows plotting the cumulative variance capturedby all components. Figure 4 shows the cumulative varianceplot for the 765 principle components computed by PCA.The figure shows that nominal die variance can be explainedby around 400 principle components, roughly half the totalcomponents present in the dataset. This is an indication thatmultivariate testing may be able to accurately describe mostpassing dies using a subset of test dimensions, offering morejustification for pursuing testset optimization and multivariatemodel-based screening.


Fig. 5. Simple Multivariate Testset Optimization [8]

B. Multivariate Testset Optimization

Parametric testset optimization was introduced in [8]. Op-timization is performed using a binary SVM classifier thatseparates passing and failing dies based on labeled examples.A training example is a vector-label pair (�si, li), of normalizedparametric wafer probe measurements taken for each die,where �si = (t1, t2, ..., tn) are n parametric test measurementst1 − tn, and li is a pass/fail classification label set by waferprobe binning.

To find an optimal parametric testset, the ν-SVC soft marginclassifier is used to separate passing and failing dies withmaximum margin in a n dimensional wafer probe test space.In this work, the test space was quite large, containing 765different test measurements for each die, resulting in manypossible separating hyperplanes. Optimization is used to findthe best linear hyperplane that separates passing and failingdies with maximum margin using the following quadraticprogramming formulation introduced in [14].

Maximize − 1

2

m∑i,j=1

αiαj liljK(�si, �sj)

Subject to 0 ≤ αi ≤ 1

m,

m∑i=1

αili = 0,

m∑i=1

αi ≥ ν,

where αi,j are Lagrange multipliers, �si,j are example dies,li,j ∈ {±1} are pass/fail training labels, ν is the fraction ofpermitted training errors, K(�si, �sj) is the dot product kernel,and m is the number of examples [14]. In general, passingand failing dies may not be 100% linearly separable so ν isused to control the fraction of misclassified samples allowedwhen establishing a classification boundary.

Figure 5 shows a simple separating hyperplane obtainedafter testset optimization in three test dimensions [8]. In thiscase, the hyperplane completely separates passing and failingdies. In general, a clear separation may not always be possibledepending on die labels and the number of test dimensions.However, the goal of multivariate testset optimization is to find

Fig. 6. Top 30 Selected Tests By Testset Optimization

the best hyperplane that can separate the majority of passingand failing dies regardless of class overlap [14]. It was shownin [17], that the normal vector �w, which defines the linearhyperplane orientation, encodes the importance of each testwhen separating passing and failing dies.

An optimally weighted testset is determined by the compo-nents of the optimal separating hyperplane’s normal vector �wshown in Figure 5. As explained in [8], this vector describeshow far the hyperplane has tilted in each test dimensionto correctly classify failing devices. In Figure 5, tests T2and T3 are weighted higher than test T1 since the weightvector points in the direction of T2 and T3. This makes sensebecause passing and failing dies can be separated by a linearcombination of tests T2 and T3, making T1 less relevant. Also,the hyperplane is almost parallel to test direction T1 showingthat variation in test T1 will not impact the classification ofdies as passing or failing. Thus, finding an optimally weightedtestset amounts to finding the best linear combination of teststhat separates passing and failing dies in a high dimensionaltest space.

The results of testset optimization are shown in Figure 6,where the x-axis plots the 30 highest weighted tests and they-axis plots their normalized test importance. This figure wasgenerated by performing testset optimization across the first80% of production wafer probe data. The dataset consisted oftens of thousands of subsampled passing dies and all failingdies from the bottom 10% of tail failures as shown in thePareto plot of Figure 1. In Figure 6, it is clear that the mostrelevant tests are Idd and voltage measurements, confirmingthe pairwise test correlation findings presented in Figure 2.

C. Recursive Test Elimination

This work proposes a new parametric testset optimizationmethod based on recursive test dropping, similar to the featureelimination algorithm proposed in [18]. We call this methodautomatic testset minimization (ATM). The algorithm appliesthe above ν-SVC testset optimization procedure recursively,at each step removing the worst ranked test, until a desiredsubset of k tests is reached. By applying multiple successivetestset optimization steps we ensure that the best subset of ktests is found, not just an initial ranking of all available tests.We present an outline of ATM shown in Algorithm 1.


Algorithm 1 Automatic Testset Minimization (ATM)Inputs:X = [�t1, �t2, . . . , �tn]

T {Parametric Test Measurements}�y = [l1, l2, . . . , li]

T {Pass/Fail Labels}k = 100 {Size of Test Subset to Find}Initialize:s = [1, 2, . . . , n] {Index of Tests to Keep}repeatX = X(:, s) {Select Test Subset}w = νSV C(X, �y) {Optimize Test Subset}c = abs(w) {Compute Test Importance}f = argmin(c) {Get Worst Ranked Test Index}s = [s(1 : f − 1), s(f + 1 : length(s))]{Drop Worst Ranked Test}

until |s| = kOutput: s {Best Test Subset}

D. Objective Functions

Various objective functions can also be leveraged whenusing ATM to reduce total test cost, test time, probe cardcosts, or even tester to tester variability. For example, ifindividual test cost is known, this can be taken into accountwhen performing testset optimization so the algorithm prefersto drop expensive tests. If tester to tester variability of eachtest is known the ATM method can find an optimized subset oftests that minimizes variability. Additionally, if the probe cardcost associated with each test is known, one can minimize thenumber of tests requiring expensive probe card components.

In this case study, most analyzed tests were parametric Iddand voltage measurements. These tests had a similar applica-tion cost, therefor, dropping one test was not more importantthan dropping another. With additional cost information, theproposed ATM method can easily be modified to handletestsets with very distinct test costs. A simple way to applya cost objective function is to scale all test measurements bythe inverse of their test cost. The ν-SVC algorithm interpretslarger valued measurements as more important and thereforewill naturally find test subsets where the number of tests andtotal test cost are simultaneously minimized.

IV. TEST COST REDUCTION

We propose three test cost reduction strategies using mul-tivariate model-based testing. Each strategy targets a specificuse case and therefore will have different cost reduction andscreening capabilities. For high quality low DPPM testing wepropose Test Dropping Validation, for aggressive cost cuttingand ultra low cost products we propose a Single MultivariateTest Model, and for a balanced approach we propose a Multi-Bin Multi-Model strategy. Each method strives to reduce thenumber of tests used to screen the bottom 10% of tail failuresshown in Figure 1. The top 10% of tests are never consideredfor dropping since their test effectiveness is high, however,their parametric values may be used in subsequent screeningcomputations.

Algorithm 2 Test Dropping Validation (TDV)Inputs:X = [�t1, �t2, . . . , �tn]

T {Parametric Test Measurements}L = [l1, l2, . . . , li]

T {Upper & Lower Safe Test Limits}k = 10 {Size of Test Subset to Find}Initialize:M = [ ] {Set of Test Models to Keep}d = [ ] {Index of Dropped Tests}�y = [ ] {Pass/Fail Labels}for all �t ∈ X do[LSL,USL] = L(�t) {Get Test Limits}{Check Test Limits}if LSL < �t < USL then�y(�t) = pass

else�y(�t) = fail

end ifs = ATM(X, �y, k) {Find Optimal Test Subset}m = νSV C(X(:, s), �y) {Generate Test Model}{Check Test Validation Performance and Drop Tests}if escapes(m) = 0 & overkills(m) < 0.25% then

d = [d,�t] {Remember Dropped Test}M = [M,m] {Remember Test Model}X(:,�t) = [ ] {Drop Predicted Test}X(:, s) = [ ] {Drop Test Prediction Subset}

end ifend forOutput: d,M {Set of Dropped Tests & Test Models}

A. Test Dropping Validation

For high quality products, test cost reduction methods mustminimize cost but not impact DPPM levels. In many cases,conservative screening is preferred allowing some overkill toensure that minimal defective parts reach the customer. Forsuch products, we propose test dropping validation (TDV), aconservative test dropping approach that only removes testswhose pass/fail outcome can be accurately predicted by asubset of other tests present in the reduced testset.

Test dropping validation is outlined in Algorithm 2. It findscorrelated tests such that a combination of several measure-ments can be used to predict the outcome of another test. Tofind the best subset of tests we employ the proposed ATMalgorithm, and to perform multivariate test screening basedon this subset, we apply the ν-SVC algorithm introduced inSection III. TDV results are presented in Figure 9.

Multivariate screening is accomplished by a predictive func-tion f(�x) which is constructed based on parametric test data(X, �y), where X is a subset of selected test measurementsand �y are pass/fail test labels. Such a model can be used topredict the label of an unseen test sample �x where �x �∈ X. Inpass/fail screening, f(�x) is a decision function that predicts thelabel of �x based on its parametric measurements. The ν-SVCalgorithm finds a multivariate test function of the form:


f(�x) = sgn

(m∑i=1

yiαiK(�x, �xi) + b

),

where K(�xi, �xj) = exp(−γ‖�xi− �xj‖2) is the Gaussian kernel,and αi and b, are parameters computed during optimization.

The TDV algorithm aims to drop tests which fail veryinfrequently, however, what can be done if a test never fails atall? In such situations, the above algorithm does not applysince there are no failing samples from which to infer amultivariate test screening model. To justify dropping teststhat always pass, we propose using kernel density estimation(KDE) [19]. This allows us to compute the probability that agiven test will fall outside its upper or lower safe test limits.

Using a large number of n test measurements for a specifictest t, we compute the univariate probability density functionf̂(t) based on all n measurements ti − tn within the range oftest limits. The kernel density estimate is computed as shownin [19] using the following equation:

f̂(t) =1

n

n∑i=1

K(�t, �ti),

where K(�t, �ti) is the normalized Gaussian kernel that inte-grates to 1.

Once a probability density function for test t has been found,it can be used to determine the probability that another testedsample will fall outside the test limits and therefore fail. Wefind this by computing one minus the cumulative probabilitydensity of test t passing:

P[fail] = 1− P[lsl < t < usl] = 1−∫ usl

lsl

f̂(t)dt,

where usl and lsl are the upper and lower safe limits re-spectively. If P[fail] is higher than DPPM limits allow, thetest cannot be dropped, however, if probability of failure isextremely low, it may be justified to drop the test. Figures 10and 11 show examples of KDE performed on passing teststhat are unlikely and likely to fail based on their distributionof test measurements.

B. Single Multivariate Test Model

For aggressive test cost reduction, we propose generating asingle multivariate test model to screen all tail failures using asubset of tests. To generate such a test model, a representativesubset of passing dies and all tail failures in the Pareto ofFigure 1 are combined in one test dataset. The ATM algorithmuses this dataset to find the best subset of parametric teststhat can distinguish between passing dies and infrequentlyfailing dies requiring 90% test effort to screen. After a desiredtestset is found, the ν-SVC algorithm is used to generate amultivariate test model that screens tail failures using only thesubset of tests found by ATM.

Depending on the cost savings desired, different sizedtestsets can be used to achieve optimal screening performance.To explore the tradeoffs between test cost, screening ability,and overkill, the number of tests used can be swept to find the

best multivariate test model with the lowest test cost. Figures12 and 13 show examples of sweeps performed on datasetswith thousands and tens of thousands passing dies respectively.

C. Multi-Bin Multi-Model Testing

In many cases, a single multivariate test model cannotaccurately capture the wide range of possible tail failures thatoccur, given strict test escape and overkill requirements. Insuch cases, it is possible to generate several test models, onefor each failing bin that is difficult to screen with the singlemultivariate test model. The multi-bin multi-model approachcan screen more diverse failures because each failing bin ispredicted based on its own subset of tests that best separateits type of failure from the distribution of passing dies.

Choosing which failing bins to predict individually can besolved by first applying a single multivariate test model andobserving which failing tests are difficult to correctly predict.Each of these difficult to predict tests can be individuallyanalyzed and screened so the combined effort of severalmultivariate test models can increase screening performanceand reduce test escapes. Figure 14 shows a stacked Paretoplot of correctly predicted failures and test escapes for oneapplication of the single multivariate test model. This figurecan be used as a guide when selecting tests that should beindividually screened by their own multivariate test model.

V. CASE STUDY SETUP

To evaluate the performance of each test cost reductionmethod, a case study was performed using over 1.5 milliontested RF/A devices spanning across dozens of lots in chrono-logical order. Each tested device had over 700 parametricmeasurements most of which were Idd and voltage parameters.Data from the first 80% of lots was used to find the best testsubsets and generate multivariate test models, while data fromthe last 20% of lots was used to evaluate each model’s costreduction and screening capabilities. We call data from thefirst 80% the training set, and data from the last 20% thevalidation set. All algorithms and test methods were imple-mented using RapidMiner, an open source rapid prototypingtool for data mining and machine learning experiments [20].

A. Test Data Collection

The large volume of test data collated for this study hadto be carefully managed. All test data was extracted fromstandard test data format (STDF) [21] files wafer by wafer,generating rows of parametric measurements for each testeddevice. In addition to extracting test measurements, each diewas given a unique ID computed based on the its collectedwafer id, lot id, and (x,y) coordinates. This ensured each testeddie was uniquely identifiable for more detailed analysis andtraceability.

After parsing all STDF files, it was necessary to handlemissing or spurious test data that are often present in largevolume wafer probe testing. Parsing scripts were written to re-move dies with missing or corrupt data, ensure re-probed diescontained the latest measurements, and select only parametric


Fig. 7. Pareto of Validation Set Failures

tests for analysis. Once all data was preprocessed, the first 80%of lots were appended into a large training dataset and the last20% of lots were combined to form the validation dataset.Figure 1 shows the Pareto of fails for the training dataset andFigure 7 shows the Pareto of fails for the validation set. Notethe similar distribution of failures indicating that developingtest models based on the first 80% of lots should help predictfailures in the subsequent 20%. From this point on trainingand validation datasets were kept separate.

B. Dataset Normalization

Dataset normalization was a key concern throughout thecase study. Given the range of different test measurementscollected, it was important to normalize each test so theywere on comparable scales. For example, a voltage parametermeasured in Volts is orders of magnitude larger than a currentparameter measured in μAmps. To ensure balanced modeling,it was important to normalize each measurement such that thetest results were within the same range of values.

We explored normalizing each test by its standard deviation,inter quartile range, and performing a simple range transform.From our experience, a range transform of [0-1] for eachtest measurement gave the best results. To find a propernormalization model, we first scaled all passing dies from thetraining set in the range of [0-1] and used the same scalefactors when normalizing failing dies in the training set andall dies in the validation set. This strategy ensured that nominaltested devices were in the [0-1] range while failing dies wouldhave at least one measurement outside that range.

VI. ANALYSIS OF RESULTS

The goal of this case study was to explore tradeoffs betweencost savings and screening effectiveness when performingmultivariate model-based testing. With that goal in mind, thetest effectiveness Pareto plots in Figures 1 and 7 show that thecost savings strategy should be applied to screen the bottom10% of tail failures, maximizing the potential for cost savingswhile minimizing the possibility of test escapes. In this section,we discuss how to determine when a test model is generalenough to safely apply and present the performance of thethree proposed cost reduction methods.

Fig. 8. Sweep of Passing Training Examples

A. Model Generation

In any modeling problem, there exists a relationship be-tween data one wishes to extract. In this case study, we wishto find classification models that screen tail failures using asubset of tests. We seek a relationship between parametric testmeasurements in a multivariate test space to determine if a dieis a tail failure or passing using only a subset of tests.

Classification algorithms like ν-SVC will always deliversome learned model based on data presented to them, however,these models may not always generalize well to the actualtest problem. To ensure we develop a general test model,we must (1) establish model convergence, and (2) measuremodel performance by a validation test set. We first discussthe problem of model convergence and then present validationresults in the rest of this section.

The model convergence problem asks: “When have we seenenough training examples to ensure that a learned test modelproperly generalizes to unseen parametric test data?” Thisquestion is particularly important in multivariate test modelingsince there is a severe imbalance between passing and failingexamples. There are orders of magnitude more passing diesthan failing dies. To solve this problem, we propose trainingmany models with all available failing dies and an increasingnumber of passing dies while measuring overkill and testescape performance on the validation set.

Figure 8 shows a test model converging as the number ofpassing dies trained with is swept across tens of thousands.Note that the number of overkills flattens out. There is anexponential drop in the number of validation overkills ob-served as we increase the number of passing dies the modelingalgorithm sees. Also, the number of test escapes rises slightlybut stays flat once convergence has been achieved. Thisfigure shows that training with tens of thousands of exampleswill ensure we have seen enough passing dies to generalize.Since adding more passing examples does not impact thetest escape performance, we are confident that our modelcaptures a real relationship between parametric measurements,which explains the difference between passing and failing dies.To ensure minimal overkill and model generality, subsequentmodels were trained with more than 40k passing examples.


Fig. 9. Test Dropping Validation, 42 Tests Dropped

B. Test Dropping Validation Results

The performance of test dropping validation is summarizedin Figure 9. The x-axis shows 42 tests that were dropped bythe TDV algorithm and replaced with multivariate test modelsthat take 10 other test measurements as inputs. By design,the algorithm only drops tests after it finds a replacementtest model that can screen all failures with 0% test escapes.Therefore, each test in Figure 9 has been removed and replacedby a multivariate test model that computes a pass/fail decisionbased on 10 kept parametric measurements. During execution,the TDV algorithm ensures each dropped parametric test andthe 10 test used to predict it are removed for subsequent itera-tions. Therefore, the 42 dropped tests in Figure 9 demonstratea total of 5.6% test cost savings with 0% test escapes.

The bars in Figure 9, show the percent of validation failsnormalized by all training and validation fails plotted on theleft axis for each dropped test. In other words, they show thepercent of validation fails out of all fails for each dropped test.The cumulative percent overkill is plotted as an increasing redline measured in tenths of percents on the right axis. Thesemeasurements show that the generated models are robust sincethey do not only screen a small fraction of fails after beingtrained on a huge number of example failures. On the contrary,models capture all validation failures with 0% test escapes,even though the validation set often contains between 25% to60% of failures never seen during training. This indicates thatin the extreme case of a yield excursion, where subsequent lotsmay contain many more failures of a specific test, generatedmodels will still correctly screen every failing die.

In the case where a test never fails across many productionlots, we proposed using KDE to determine if it can be safelydropped. Figure 10 shows a histogram and the kernel densityestimate of a parametric test that has not yet failed and isunlikely to fail in the future. The histogram of measurementsis generated from thousands of dies, where the x-axis plotsthe window of acceptable parametric measurements betweenthe lower and upper safe limits. From the density estimatefunction, we compute the probability of a die landing outsidethe lower and upper safe limits is less than 10−6 with 99.9%confidence, indicating this test may be a safe drop candidate.

Figure 11 shows a kernel density estimate for a different

Fig. 10. KDE of Unlikely Failing Test

Fig. 11. KDE of Likely Failing Test

parametric test, which is more likely to fail. Notice how largeportions of parametric measurements fall near the upper andlower safe test limits. When computing the probability of thistest failing, according to the density estimate, we obtained0.001 with 99.9% confidence. Therefore, this test is muchmore likely to fail sometime in the future, given the expectedparametric test variability. Consequently, this test is not a dropcandidate even though it has not yet failed. With KDE weprovide test engineers a method to decide which tests thatnever fail are safe to drop. From the results in Figures 10 and11, we observe that kernel density estimation helps find teststhat always pass and have very wide test limits compared totheir variance. Only these tests are selected as acceptable safedrop candidates.

C. Exploring Cost Tradeoffs

To understand what value a multivariate test model provideswhen reducing test cost, we generated thousands of models toexplore the tradeoffs between the number of tests dropped andscreening capability. Figures 12 and 13 show the tradeoffswhen increasing the number of tests from 10 to 765 usingmodels generated with thousands and tens of thousands ofpassing dies. To generate each model, all tail failures and alarge subset of passing dies were used from the training set,while the same validation set checked screening performance.

Figure 12 shows the number of tests on the x-axis and thenumber of training and validation escapes and overkills on they-axis in log scale. As the number of tests increases, we clearly


Fig. 12. Sweep # Tests, Thousands of Passing Dies

see a monotonic decrease in the number of training overkillsand test escapes. This shows that it takes a minimum numberof tests for a given dataset to properly screen failures fromnominal dies. In Figure 12, we train models using thousandsof passing samples and observe that the number of trainingescapes and overkills drops to zero at around 300 tests, whichwould result in about 50% test cost reduction.

These findings show that it takes at least 300 tests toproperly distinguish between passing and failing dies and givesan upper bound of 50% test cost reduction when training withthousands of passing dies. We observe a minimum of 30 testescapes and 169 overkills. This translates to 0.42% escapes,however training with only thousands of nominal dies incurs asteep 1.69% overkill. In Figure 8, we already showed the needto train with at least 40k passing dies to generate a convergedmodel and minimize overkill. Therefore, we present tradeoffresults when training with more than 40k passing dies whilesweeping the number of tests from 10 to 700.

In Figure 13, when training with tens of thousands ofpassing dies, a different tradeoff curve appears. In this case, weneed more tests to achieve zero training escapes and overkills.This result shows that to train a general model with minimaloverkill the lower bound on the capacity for cost savings dropsto roughly 530 tests or about 30%. There is a minimum of 51test escapes and 22 overkills, roughly translating to 0.81%escapes and 0.22% overkills. Observe that there is a crossoverpoint at around 400 tests where the number of overkills dropsbelow the number of test escapes. This is in contrast to Figure12, where there are more escapes than overkills.

From Figures 12 and 13, we can conclude that trainingwith thousands of samples generates an over-fit model whichdoes not fully capture the space of passing dies, leading tomany overkills but fewer test escapes. On the other hand,training with tens of thousands of passing dies ensures aconverged generalized model with minimal overkill and somemore escapes. Given these tradeoff curves, test engineers havethe ability to choose a target cost cutting objective to best meettheir test quality requirements.

To further reduce test escapes, we suggest using a multi-model multi-bin test strategy targeting specific bins that aredifficult to screen with a single multivariate test model. To

Fig. 13. Sweep # Tests, Tens of Thousands of Passing Dies

Fig. 14. Stacked Tail Fail Pareto of Validation Set

find bins that are problematic, we plot a stacked Pareto oftail failures from the validation set in Figure 14. The x-axisshows individual bin numbers, the left axis shows the failingdie count, and the right axis shows the percent validation tailfails. Blue bars indicate the number of captured fails and pinkbars indicate the number of test escapes for each failing bin.

In Figure 14, bins 1, 8, and 15 clearly stand out asbeing problematic. These three bins alone are responsiblefor 59% of all validation test escapes incurred by the singlemultivariate test model. By predicting these bins individually,we can reduce the test escape rate with minimal additionaloverkill. Table I shows the performance of multi-bin multi-model screening. For all three problematic bins, escapes werereduced by finding a different subset of tests and generating anadditional multivariate test model that best captures failures inthat bin. A total of 0.584% additional failures were screenedat cost of 0.03% additional overkill.

Bin Additional % Reduced % AdditionalNum. Screened Fails Escapes Overkills

1 25 0.395 0.018 7 0.111 0.0115 5 0.078 0.01

Total 37 0.584 0.03

Table I: Multi-Bin Multi-Model Screening Performance


D. Discussion

The test cost reduction methods presented in this work showthe tradeoff between the number of dropped tests and thescreening capability of multivariate test models generated byanalyzing test data. It should be noted that the tradeoff figurespresented are based only on the data we have analyzed andmay not reflect absolute cost savings potential. Restrictions intest ordering, derived measurements, and other proprietary testrequirements may not permit the elimination of all indicatedtests from the actual test program.

The goal of this case study was to show the potential forusing multivariate test methods to achieve significant costreduction with minimal impact on test escapes and overkills.Test cost reduction is possible because we consider the testproblem from a multivariate perspective, allowing multiplemeasured parameters to be evaluated simultaneously whenmaking a pass/fail test decision. This test perspective, coupledwith the automatic testset minimization algorithm, enablesparametric test cost reduction in RF/A wafer probe screening.

While the proposed method worked very well on parametricwafer probe test data from an RF/A device, it may not bedirectly applicable to other products or industry segments. Forexample, in SoC testing various different on-chip blocks areprobed that may not even be physically connected, resultingin many orthogonal test measurements where no multivari-ate generalization can be achieved. Additionally, in productswithout the 90:10 Pareto of test effectiveness, there may notbe significant cost reduction when screening tail failures usinga subset of tests.

Table II shows the overall cost reduction performance forthe three methods surveyed in this case study. Test droppingvalidation was able to achieve 5.6% cost savings with zero testescapes and 0.22% overkill. The single model test methodachieved 30% cost savings with 0.993% test escapes, whilethe multi-bin multi-model method reduced escapes by overhalf to 0.409%. It is important to note that 30% cost savingdoes not imply 30% redundancy in the original testset froma univariate test perspective. Every test still individually mea-sures something different. The 30% cost saving is achievedby serening in a multidimensional test space where some testsbecome unnecessary. Other data sets will need to be analyzedto determine how performance changes with different devicetypes and test flows.

Method % Cost Savings % Escapes % OverkillTDV 5.6 0.000 0.221

Single Model 30 0.993 0.281Multi Model 30 0.409 0.284

Table II: Comparison of Cost Reduction Methods

VII. CONCLUSION

This work presented a case study of parametric wafer probetest cost reduction for an RF/A device. We proposed a newautomatic testset minimization (ATM) algorithm and threemultivariate test methods for reducing cost by test dropping.

Analyzing over 1.5 million dies, we showed that 5.6% test costreduction can be achieved with 0% test escapes and 0.221%overkill. More aggressive cost cutting strategies reduced testcost by up to 30% if 0.409% test escapes and 0.284% overkillscan be tolerated in wafer probe. Cost saving was achieved byleveraging optimal subsets of parametric test measurementsand multivariate test models to effectively screen tail failuresrequiring 90% test effort to capture. The case study foundthat large volumes of wafer probe test data can be analyzed toextract relevant test subsets and perform multivariate screeningto significantly reduce parametric wafer probe test cost.

REFERENCES

[1] Sudarshan Bahukudumbi, Sule Ozev, Krishnendu Chakrabarty, VikramIyengar, “A Wafer-Level Defect Screening Technique to Reduce Test andPackaging Costs for ”Big-D/Small-A” Mixed-Signal SoCs,” ASP-DAC,2007.

[2] Peter M. O’Neill, “Production Multivariate Outlier Detection UsingPrinciple Components,” ITC, 2008.

[3] Phil Nigh, Anne Gattiker, “Test Method Evaluation Experiments & Data,”ITC, 2000.

[4] Frank Poehl, Frank Demmerle, Juergen Alt, Hermann Obermeir, “Pro-duction Test Challenges for highly integrated Mobile Phone SOCs - ACase Study,” ETS, 2010.

[5] W. Robert Daasch, C. Glenn Shirley, Amit Nahar, “Statistics in Semi-conductor Test: Going Beyond Yield,” ITC, 2009.

[6] R. Daasch and R. Madge, “Data-Driven Models for Statistical Testing:Measurements, Estimates and Residuals,” ITC, 2005.

[7] Erik Jan Marinissen, et al., “Adapting to Adaptive Testing,” DATE, 2010.[8] Dragoljub (Gagi) Drmanac, Nik Sumikawa, Li-C. Wang, Magdy S.

Abadir, “Multidimensional Parametric Test Set Optimization of WaferProbe Data for Predicting in Field Failures and Setting Tighter TestLimits,” DATE, 2011.

[9] Andrea Paganini, Mustapha Slamani, Hanyi Ding, John Ferrario, NanjuNa, “Cost-competitive RF Wafer Test Methodology for High VolumeProduction of Complex RF ICs,” ECTC, 2008.

[10] A. Nahar, K. M. Butler, J. M. C. Jr., and C. Weinberger, “QualityImprovement and Cost Reduction Using Statistical Outlier Methods,”ICCAD, 2009.

[11] E. Yilmaz and S. Ozev, “Defect-Based Test Optimization for Analog/RFCircuits for Near-Zero DPPM Applications,” ICCAD, 2009.

[12] H. D. Stratigopoulos, P. Drineas, M. Slamani, and Y. Makris, “Non-RFTo RF Test Correlation Using Learning Machines: A Case Study,” VTS,2007.

[13] Vladimir N. Vapnik, The Nature of Statistical Learning Theory. Springer1999.

[14] B. Scholkopf, A. Smola, R. Williamson, and P. L. Bartlett, “New supportvector algorithms,” Neural Computation, 2000.

[15] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vec-tor machines,” 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

[16] I.T. Jolliffe, Principal Component Analysis. Springer, 1986.[17] J. Brank, M. Grobelnik, N. Milic-Frayling, and D. Mladenic, “Feature

Selection Using Linear Support Vector Machines,” Microsoft ResearchTechnical Report, 2002.

[18] Isabelle Guyon, Jason Weston, Stephen Barnhill, Vladimir Vapnik,“Gene Selection for Cancer Classification using Support Vector Ma-chines,” Machine Learning, 2002.

[19] Emanuel Parzen, “On Estimation of a Probability Density Function andMode,” Annals of Mathematical Statistics 33, 1962.

[20] Mierswa, Ingo and Wurst, Michael and Klinkenberg, Ralf and Scholz,Martin and Euler, Timm, “YALE: Rapid Prototyping for Complex DataMining Tasks,” SIGKDD, 2006, Software available at http://rapid-i.com.

[21] IEEE Test Technology Standards Group “Standard Test Data Format(STDF) Specification Version 4,” IEEE TTSG, 2007.


wafer probe test cost reduction of an rf/a device by...

Documents