multivariate theory and applications

Upload: monisiva

Post on 03-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Multivariate Theory and Applications

    1/225

  • 7/28/2019 Multivariate Theory and Applications

    2/225

    Quality ControlT H E O R Y A N DA P P L I C A T I O N S

    (ami1 FuchsTel Aviv UniversityTel Aviv, IsraelRon S , Kenet tKPA Ltd. andTel Aviv UniversityRaanana, Israel

    M A R C E L

    MARCELDEKKER,NC. NEWYORK BASEL HONG O N GD E K K E R

  • 7/28/2019 Multivariate Theory and Applications

    3/225

    Library of Congress Cataloging-in-FublicationFuchs, CamilMultivariate quality control: theory and applicationsCamilFuchs, Ron S.Kenett.p. cm.- Quality and reliability : 54)Includes bibliographical references and index.

    ISBN 0-8247-9939-9 (alk. paper)1. Qualitycontrol-Statisticalmethods.2.Multivariateanalysis. I. Kenett,Ron 11. Series.TS 56.F817 1998658.562015195354~21 98-2764CIP

    The publisher offers discounts on this bookwhen ordered in bulk quantities. Formore information, write to Special SaledProfessional Marketing at the addressbelow.This book is printed on cid-free paper.Copyright Q1998 by MARCEL DEKKER, INC. All Rights Reserved.Neither this book nor any part may be reproduced or transmitted in anyorm orby anymeans, electronic or mechanical, including photocopying, microfilming,and recording, or by any information storage and retrieval system, withoutpermission in writing from the publisher.MARCEL DEKKER,INC.270 Madison Avenue, New York, New York 10016http:/hw.dekket:comCurrent printing (last digit):1 0 9 8 7 6 5 4 3 2 1PRINTED INTHE UNITED STATES OF AMERICA

    http:///hw.dekket:comhttp:///hw.dekket:com
  • 7/28/2019 Multivariate Theory and Applications

    4/225

    To Aliza, Amir and Efrat"CFTo my mentors, DavidCox, Sam Karlin, George Box andShelley Zacks who shaped my interest and understandingof industrial statistics." R S K

  • 7/28/2019 Multivariate Theory and Applications

    5/225

  • 7/28/2019 Multivariate Theory and Applications

    6/225

    Data are naturally multivariate and consist of measurements onvarious characteristics or each observational unit. n early exampleofmultivariate data is provided byNoahs Ark for whichhe requirementswere: The length of the ark shall be three hundredcubits, the breadth ofitJiJky cubits, and the height of it thirty cubits (Genesis 6:15). The arkwas a single specimen producedby a craftsman, and statistical qualitycontrol procedures were unlikely to have been appliedor ensuring theconformance to standardsf the trivariate data for that product. However,modern industrial processes with mass production are inconceivablewithout the use of proper statistical quality control procedures.Although multivariate methods are better suited for multivariatedata, theiruse is not common practice in industrial settings.he imple-mentation of multivariate quality control involves more complex calcu-lations, creating a barrier to practitioners who typically preferhe simplerand less effective univariate methods. But this barrier can and shoulderemoved. Modern computers have created new opportunities for datacollection and ata analysis. Data areow collected en masse and storedin appropriate databases. Adequate statistical tools that transformatainto information and knowledgeare needed to help the practitioner totake advantage of the new technologies. Multivariate Quality Control:Theory and Applications presents such toolsn the context of industrialand organizational processes.

    V

  • 7/28/2019 Multivariate Theory and Applications

    7/225

    vi PrefaceThe main objectiveof the book is to provide a practical introduc-tion to multivariate quality controly focusing on typical quality controlproblem formulations and by relying on ase studies for illustrating thetools. The book is aimed at practitioners and studentslike and can beused in regular university courses,n industrial workshops, and as a ref-erence. The typical audience or industrial workshops based onhe bookincludes quality and process engineers, industrial statisticians, qualityand production managers, industrial engineers, and technology man-agers. Advanced undergraduates, graduates, and postgraduate studentswill also be able to use the book for self-instruction in multivariate qual-

    ity control and multivariate ata analysis. One-semester courses suchsthose on statistical process control or industrial statisticsan rely on hebook as a textbook.Our objective was to make the text comprehensive and modernin terms of the techniques covered, correct in the presentationf theirmathematical properties, and practical with useful guidelines and de-tailed procedures for the implementionof the multivariate quality con-trol techniques. Special attention was devoted to graphical proceduresthat can enhance the book's applicability n the industrial environment.The bookpresentsmacroswritten in the statistical anguageMINITABTM, whichan be translated to other languages such as-Plus,SAS, SPSS or STATGRAPHICS. Access to MINITAB1.0 or higher isadvisable for running the macros attached tohe text. Earlier versions ofMINITAB, in which the calculations are performed with single preci-sion, are inadequate for the calculations involvedn multivariate qualitycontrol.The idea of writing a book on multivariate quality control has ma-tured after several years f collaboration betweenhe authors, The struc-ture of the book reflects our understanding of how multivariate qualitycontrol should be presented and practiced.The book includes nonstan-dard solutions to real-life problems suchs the use of tolerance regions,T2-decom position and MP-charts. There is no doubt that with the ex-pansion of automatic data collection and analysis, new problems willarise and new techniques will e proposed for their solution. We hopethis book contributes to bridging the gap between theory and practiceand expanding he implementation of multivariate quality control.This book distills years of experience of many people. They in-

  • 7/28/2019 Multivariate Theory and Applications

    8/225

    Preface viiclude engineers and operators from the metal, electronic, microelec-tronic, plastic, paper, medical, and food industries who were willing topilot the implementation of multivariate quality control. e thank themall for their willingness tory new procedures that in manyases provedeffective beyond expectations. Special thanks tor. Ori Sharon for hishelp in writing some of the MINITAB macros, to Mr. Brian Vogel forrunning the s t a r plots, and to Mr. Russell Dekkerfor ongoing editorialsupport and continuous encouragement.

    Camil FuchsRon S.Kenett

  • 7/28/2019 Multivariate Theory and Applications

    9/225

  • 7/28/2019 Multivariate Theory and Applications

    10/225

    Contents

    Preface V1.2.3.4.5.6.7.8.9.

    10.11.12.

    Qualityontrolithultivariateata 1The MultivariateNormalDistributionnQualityControl 9Qualityontrolithxternallyssignedargets 27Quality Control with Internal Targets-Multivariate ProcessCapability 47Quality Control withTargets rom a ReferenceSample 63Analyzingataithultivariateontrolharts 75Detection of Out-of-Controlharacteristics 81T h e Statistical Toleranceegionspproach 91MultivariateQualityontrolwithUnitsnatches 101Applications of Principalomponents 113Additional Graphical Techniquesor MultivariateQuality Control 121Implementingultivariateualityontrol 135

    ix

  • 7/28/2019 Multivariate Theory and Applications

    11/225

    X ContentsAppendix 1. MINITABTM Macros or MultivariateQualitytrol 145Appendix 2. The Datarom the Case Studies 179Appendix 3. Review of Matrix Algebra for Statistics withMINI TAB^ Computations 195References 201Index 205

  • 7/28/2019 Multivariate Theory and Applications

    12/225

    Quality Control

  • 7/28/2019 Multivariate Theory and Applications

    13/225

  • 7/28/2019 Multivariate Theory and Applications

    14/225

    lQuality Control withMultivuriute Datu

    Objectives:The chapter introduces the basic issues of multivariate qualitycontrol. The construction of multivariate control charts is illus-trated using the data fiom ase Study l on physical dimensionso fan aluminum part.everal approaches o multivariate qual-ity controlarc di swsedand thefour casestudies used in the bookare briefly described.

    Key Concepts0 Management styles0 Quality control0 Multivariate data collection0 Multivariate scatterplot matrix0 The Hotelling l!-distance0 The T2-chart

  • 7/28/2019 Multivariate Theory and Applications

    15/225

  • 7/28/2019 Multivariate Theory and Applications

    16/225

    Qualityontrolithltivariateata 3Organizations cane managed in different ways.he reactive lais-sez faire approach reliesn onsumer complaints, liability suits, shrink-ing marketshare or financial losses to triggerfforts to remedy current

    problems. In this mode of thinking, there are no dedicated efforts..toanticipate problems or improve current conditions. An alternative ap-proach, that requires heavy investments in screening activities, relieson nspection of the product or service providedby the organizationscustomers. One set of statistical tools that apply to such a screening,or inspection-based management style, is acceptance sampling. Usingsuch tools enables decision makers to determine what action to take ona batch of products. Decisions basedon samples, rather thanon 100%inspection, are usually more expedient and cost effective. Such deci-sions, however, only apply o the specific populationof products fromwhich the sample was drawn. There re typically no lessons learnedyinspection-based management and nothings done to improve he con-ditions of future products. Quality control requires a different tyle ofmanagement. This third management approach focuses onhe processesthat make the products. Data collected s used to track process perfor-mance over time, identifyhe presence of assignable causes and exposecommon causes that affect the process in an ongoingway. Assignablecauses occur at specific points in time and usually require immediatecorrective action, while common causesre opportunities for improve-ments with long-term payoffs. Feedback loops are the basic structurefor process-focused management. For a comprehensive treatmentf in-dustrial statistics in the context of three management styles: inspectionbased, process focused quality control, and proactive qualityy design,see Kenett and Zacks1998).Quality control is based on data sequentially collected, displayedand analyzed.Most data are naturally multivariate. Physical imensionsof parts can be measured at several locations and various parametersfsystems are typically derived simultaneously.n practice, one will fre-quently use a critical dimension s the one to record and trackor pur-poses of quality control.An alternative approach to handle multivariatedata is to aggregate several properties and transform ata collected byseveral measurement devices into attribute data. Underhis approach, aunit is labeled as passor fail. Tracking he percentage of defectiveproducts or nonconformities is indeed a popular approach to tatistical

  • 7/28/2019 Multivariate Theory and Applications

    17/225

    4 Chapter 1quality control. However, the transformationf continuous quantitativedata into qualitative attribute data leads to a loss of information, Thisreduces the ability to identify the occurrence f assignable causes andimpairs the understanding of the process capability. In this book wepresent tools that provide effective and efficient analysisf multivariatedata without such oss of information.Modern computers and data collection devices allow for muchdata to be gathered automatically. Appropriate toolshat transform datainto information and knowledge are required for turning investmentsin automation into actual improvements in the quality of products andprocesses. The main objective of this book is to provide a practicalintroduction to such tools while demonstrating their application. Ourapproach is to de-emphasize theory and usease studies to illustrate howstatistical findings can be interpreted in realife situations. We base thecomputationson he MINITABTM version1 1.O computer package andpresent in some instances the code for specific computations. Previousversions of MINITAB did not have he double precision capability andcaused numerical rrors in severalof the computations usedn this book.The use of earlier versions of MINITAB is therefore not recommendedfor these kind of multivariate computations.Case Studies 1 o 4 re analyzed in Chapters 3 to 1 1 . They provideexam ples of statistical quality control and process capability studies.The data of Case Study 1 consist of six physical dimension measure-ments of aluminum pins carved from aluminum blocksy a numericallycontrolled machine. Case Study is a complex cylindrical part with7variables measured on each part. Case Study comes from the micro-electronics industry were ceramic substratesreused with very stringentrequirements on critical dimensions. Case Study onsists of chemicalproperties of fruit juices that are tested to monitor possible adulterationin natural ruit juices.We only deal indirectly with issues relatedo the complexities ofmultivariate data collection. In someases major difficultiesn collectingmultivariate data are an impediment to statistical quality control initia-tives (D. Marquarclt, personal communication). For example, a samplecan be subjected to a variety f tests performed on various instrumentslocated in several laboratories and requiring different lengthsf time to

  • 7/28/2019 Multivariate Theory and Applications

    18/225

    Qualityontrolithltivariateata 5complete. Under such conditions, collectinghe data in a coherent atabase becomes a major achievement.

    Univariate quality control typically relies on control charts thattrack positions of new observations relative to control limits. A sig-nificantly high deviation from the process average signals an assignablecause affecting the process. Multivariate dataremuch more informativethan a ollection of one dimensional variables. Simultaneously account-ing for variation in several variables requires both an overall measure ofdeparture of the observation from the targets as well as an assessmentof the data correlation structure.A first assessmentof the relationshipsamong the variables in the multivariate data s given by the analysis oftheir pairwise association, which can be evaluated graphically by dis-playing an array of scatterplots. Figure.1presents an illustration of sucha scatterplot matrix, computed or the six variables fromCase Study 1mentioned above.The data display aclear pattern of association amonghe variables.For example, we observe inFigure 1.1 that there is a close relationshipbetween Diameter1 and Diameter2 but a rather weak association be-

    #+.&@ @ 993 Q@ %@ &@ +$@ @& , @$ @@Figure 1.l catterplot matrixfor the 6 variables fromCase Study 1 (ungrouped).

  • 7/28/2019 Multivariate Theory and Applications

    19/225

    6 Chapter 1tween Diameter1 and Lengthl. A significantly large distance of a p -dimensionalobservation,X rom a target will signal that an assignablecause is affecting the process. The pattern of association amonghe vari-ables, as measured by their covariance matrix has tobe accounted forwhen an overall measure of departure from targets is to be calculated.A measure of distance that takes into account he covariance structurewas proposedby Harold Hotelling n 1931. It s called HotellingsT2 inhonor of its developer.Geom etrically we can view T2 as proportional to the squared dis-tance of a multivariate observation from the target where equidistantpoints form ellipsoids surrounding the target. The higher the T2-value,the more distant is the observation from he target.The target vectorm can be derived either from a previouslycol-lected reference ample from the tested sample itself (i.e.,s an internaltarget), or from externally set nominal values (i.e.,s an external target).The T2-distance is presented in Chapter 2 and some of its theoreti-cal properties are discussed there.The T 2 distances lend themselvesographical display and the T2-chart is the classical and most commonamong the multivariate control charts (Eisenhartt al., 1947; Alt, 1984).It should be noted that the chart, like T 2 itself, does not indicate thenature of the problem, but rather that one exists. Figure 1.2 presentsheT2-chart for Case Study 1.The vertical axis indicateshe position of the99.7-th and 9 5 t h percentiles of T2. We shall expand on issues relatedto that graph in Chapters to 5 that discuss, respectively, targets thatrederived externally, internally and from a reference sample. Multivariatecontrol charts are further discussed in Chapter 6.The multivariate analysis assesses he overall departureof the p -variable observations from their target. However, by themselves, theanalysis of the T2-statistics and of the appropriate control hart do notprovide an answer to the important practical questionof the detectionof the variables that caused the out-of-control signal. Chapter 7 presentssome approaches and solutions to this issue.

    We present in Chapter 8 the statistical tolerance region approachto qualitycontrol and its related conceptf natural process region. hisapproach is different from the classical Shewhart approach that reliesprimarily on hree standard deviations control limits to detect hifts inthe mean or the variance of a distribution. Statistical tolerance regions

  • 7/28/2019 Multivariate Theory and Applications

    20/225

    Qualityontrolithltivariateata 7

    """"""""""" -I

    I I I I I40 50 60 701Observations

    Figure 1.2: T2-chartfor the 6 variables from Case Study 1 (ungrouped).

    consist of a rangeof values of he measured variables spanning a givenpercentage of the population corresponding to a process in control.hisstatement is made with a specific level of confidence. For example,one can construct an interval of values for which we can state, with aconfidence of 0.95, that this range includes0%of the population froma process under control. Such regions permit identificationf shifts indistributions without the limitation of singling out shifts in means orvariances as in he classical Shewhart control charts.In some cases, the analyzed data are presented in batches nd thequality control procedure should take this feature into consideration.Methods of analysis for this situation are presented in Chapter 9 andapplied to he data of Case Study3.Another approach to multivariate quality controls to transform ap-dimensional set of data into a lower dimensional setf data by iden-tifying meaningful weighted inear combinations of the p-dimensions.Those new variablesare called principal components.n Chapter 10we

  • 7/28/2019 Multivariate Theory and Applications

    21/225

    a Chapter 1present the principal components approach and illustrate it usingaseStudy 1 .Both the T 2 istances and the principal components lend them-selves to graphical displays. In Chapter11we compare two additionalgraphical displays that isolate changesn the different individual dimen-sions. Modern statistical software providemany options for displayingmultivariate data, Most computerized graphical techniques are appro-priate for exploratory data analysis and can greatly contribute to processcapability studies. However, ongoing quality control requires graphicaldisplays that possess several properties suchs those presented and illus-trated in Chapter11with the data from Case Study . For completenesswe also review in Chapter 11modern dynamic graphical displays andillustrate them with he data from Case Study .We conclude the text with some guidelinesor implementing mul-tivariate quality control. Chapter 12 provides specific milestones thatcan be used to generate practical implementation plans. The Appen-dices contain a descriptionof the case studies and theirdata and refer-ences (Appendix 2), a brief review of matrix algebra with applicationsto multivariate quality control, usinghe MINITAB computer package(Appendix 3), and aset of 30 MINITAB macros for the computation ofstatistics defined throughout the book (Appendix 1). The macros pro-vide an extensiveet of tools for numerical computations and graphicalanalysis in multivariate quality control. These m acros complementheanalytical tools presented in the book and support their practical imple-mentation.

  • 7/28/2019 Multivariate Theory and Applications

    22/225

    2The Multivuriute Normal

    Distribution in Quulity ControObjectives:

    The chapter reviews assumptions an d inferential method inthe multivariate normal case. Results thatprovide the theoreti-calfouradations or multivariate qual iv control procedures arestated and demonstrated. Simukations are used to llustrate thekcy theoreticalproperties of the bivariate normal distribution.Key Concepts0 The multivariate normal distribution0 Maximum Likelihood Estimators0 Distributions of multivariate statistics0 Base sample0 Data transformations

    9

  • 7/28/2019 Multivariate Theory and Applications

    23/225

  • 7/28/2019 Multivariate Theory and Applications

    24/225

    The Multivariateormalistribution l 1A multivariate observation consistsf simultaneous measurementof several attributes. For example, a part coming outf a numericallycontrolled metal cutting lathe can be measuredn several critical dimen-

    sions. Multivariate observationsreconveniently represented by vectorswith p entries corresponding, to the p variables measured on each ob-servation unit.This chapter discusses inferential methods applicableomultivariate normal data. In mostf the chapter we assume multivariatenormality of the data and present methods which nable us to produceestimates of population parameters and construct tests of hypothesesabout multivariate population parameters using a simple randomam-ple of multivariate observations. he analysis of multivariate data posesa challenge beyond the naive analysis of each dimension separately.One aspect of this challenge stems fromhe simultaneous considerationof several probability statements, such as p-tests of hypothesis. Thiscreates a problem of multiple comparisonshat requires adjustmentsofunivariate significance levels so as to attain a meaningful overall sig-nificance level. A second component of the multivariate data analysischallenge is to account for the internal correlation structure amonghep-dimensions. This again affects overall significance levels.We begin by describing he multivariate normal distribution, pro-ceed to cover he distribution of various test statistics, and then brieflymention approaches to induce normality into non-normal data.he chap-ter provides a notation and a theoretical foundation for the followingchapters.The Multivariate Normal Distribution.

    A p-dimensional vector of random variablesX = (X(),.,X());-00 < X ( e ) .c 00,l = 1, , ,.,p, is said to have a multivariate normaldistribution if its density function,f (X) is of the form:f (X) = f(X(, * ,x q (ax)-flZl-f

    exp [ - $(X- p)Z:-l(x p )where p = (p ( ) , . ,p()) s the vector of expected values p(e)=E ( X ( ) ) , = 1, , . p and E = [(ne)]4, = 1, ,. p, is the variance-covariance matrix of (X(),. , ( P ) ) , neN = cov ( X ( e ) ,X)) nd

    lGel = .

  • 7/28/2019 Multivariate Theory and Applications

    25/225

    12 Chapter 2We shall indicate that the densityf X s of this form by

    where N,,(-,) denotes a p-dimensional normal distribution with therespective parametersof location and dispersion.When p = 1,the one dimensional vector = (X('))'as a normaldistribution with meanp( ' )and variancea;, i.e.

    or

    When p = 2, X = (X('),( 2 ) ) ' has a bivariate normal distributionwith a two-dimensional vector f means p = (#U('),(2)) nd a covari-ance matrix C = [ a; a';] where a; and a; are the variances of X(')and X ( 2 ) , espectively, and 12 = a 2 1 is the covariance betweenX(') ndX('). Let p =E e the correlation betweenX(') nd X ( 2 ) ,

    If X(') nd X ( 2) re ndependent ( p = 0) heirjoint bivariate normaldistribution is the product of two univariate normal distributions, i.e.

    0 2 1 0 2

    In the general case (- 15 p 5 l) , he bivariate normal distribution sgiven by

    We can define a set of values which includes a specific proportionof multivariate distribution. Such set of values forms a natural process

  • 7/28/2019 Multivariate Theory and Applications

    26/225

    The Multivariate Normal Distribution 13

    092

    SimulatedLength l

    Simu lated Length 2Figure 2.la: Bivariate normal density with parameters of Length 1 and Length 2from Case Study 1.region. The values contained in the natural process region are charac-terized by the fact that their distances from the mean vector p do notexceed the critical values which define the specific proportion of thepopulation. The classical 3-sigma rule s equivalent, in this context, totesting whether an observations contained in the central 99.73% of thereference population(P= ,9973). In the univariate case, he respectivecritical values are called natural process limits (e.g. ASQC, 1983).Figures 2.1 and .2 presents a bivariate normal distribution whosemeans, variances and correlation coefficient are equal to those foundempirically in the bivariate distribution f Length1 and Length2 in thefirst 30 observationsf Case Study1,wherep(1)= 49.91,p(2)= 60.05,01 = 0.037, 02 = 0.037, p = 0.723.Figure2.1 is a3-D epre-sentation while Figure 2.2 displays contour plots or several values off(Length1, Length2).Whenever the parameters of the distributionreunknown and haveto be estimated from he sample, as is usually the case, we cannottipu-

  • 7/28/2019 Multivariate Theory and Applications

    27/225

    14 Chapter 2

    60.1092 v60.08256

    Simulated 60.05592yLenathl60m2649.976\

    Figure 2.1b: Bivariate normal density (rotated).late with certainty hat the natural process region will containhe statedproportion of the population. Since we now have an additional sourcefuncertainty, any statement concerninghe proportion of the populationin a articular region can onlye made with a certain levelf confidence.Each combination of a stated proportion and a given confidence leveldefine a region, The newly formed statistical tolerance region is usedin this case instead of the natural process region. Inhe univariate casethe natural process limits re replaced by statistical tolerance limits.hestatistical tolerance regions approach to multivariate quality ontrol isillustrated inChapter 8.The shape of the multivariate distribution depends on the meanvector p and covariance matrixE.When one or both of these param-eters are unknown, as is usually the case, they have to be estimatedempirically from the data. Let XI,,. X, be n p-dimensional vectorsof observations sampled independently from , ( p , E) nd p 5 . - 1.The observed mean vectorx nd the sample covariance matrix S aregiven by

  • 7/28/2019 Multivariate Theory and Applications

    28/225

    The Multivariate Normal Distribution 15

    ,9396

    Figure 2.2: Contour plotof bivariate normal density with parametersfLength 1 andLength 2 from Case Study1.

    n -S = C(Xi- X)(X,- X ) ' / ( n - 1) ,i= lare unbiased estimates of p and respectively.We note that under the normal distribution, the observed meanvector is also the maximum likelihood estimator (MLE) of p, whilethe MLE for 'I=,s9 .The ( l , ')-th element of the S-matrix is the estimated covariancebetween the variables l and l ' , , l '= 1,2 , . ,p , i.e.

  • 7/28/2019 Multivariate Theory and Applications

    29/225

    16 Chapter 2The diagonal elements of S are the corresponding sample variances.When a covariance is standardized by the appropriate standard devia-tions, the resulting estimated valuef the correlation coefficient assessesthe association between he two variables.If the sample is composed of k subgroups, each of size n, and ifthe mean and the covariance matrix n the j -th subgroupare Xj and S,,respectively, j = 1 , 2 , , , ,,k, hen we can define the grand mean andthe pooled covariance matrix as follows:

    -

    We illustrate the computations with two simulated data sets. hefirst sample was generated under a bivariate normal distribution withungrouped observations. The observations in the seconddata set havefour variables andare grouped in pairs, i.e. subgroups f size two. Theadvantage of the use of simulated data for illustrating the theory is that theunderlying distribution with its parametersre known and we can thusobserve the performance of test statistics under controlled situations. Inthe first data set, the parameters of the first50observations were identicalwith the values calculated empiricallyor Length l and Length 2 in thefirst 30 observations of Case Study 1, i.e.

    and 01 = 0.037, a2 = 0.037, p = 0.723. The population covariancematrix 10 ,0000 I: s thus

    I:=[ 3.69 9.909.903.69 1

  • 7/28/2019 Multivariate Theory and Applications

    30/225

    Theultivariatermalbution 17Those first 50 observations simulate an in-control base sample (orin-control process capability study).The sample mean for those ob-servations was - 49.9026x = [60.04411and the elements of the S-matrix are given by

    50i = l811 = C(X! - 49.9026)2/49= ,0012250812 = 821 = C(X,- 49.9026)(X,2- 60.0441)/49= .00114i = l50822 = C(X,2 - 60.0441)2/49= ,001367 ,i 3 1

    i.e. S = [ 001204 BO1123.001123 ,0013551.Appendix3 contains a brief reviewof matrix algebra with applicationsto multivariate quality control,s well as a review of the computationswhich can be performed using he MINITAB computer package. f thedata are in columns 1 and 2 (or, in MINITAB notation, n Cl and C2),the MINITAB command which leadso the computation of the S-m atrixis simply COVARIANCE Cl-C2,and we obtainhe S-matrix presentedabove.We mention that throughout the book the computations can beperformed both with MINITABs well aswith other statistical software.Since versions of MINITAB before version1 Ostore the data with onlysix significant digits,he precision is lower than that achieved by softwarewhich uses double precision.he results of the computations presentedin the book are those obtained by MINITAB version1 1.O.From the entries in the S-matrix sample correlation coefficientbetween the variables is given by

    .001140r = 4.001220 001367 = .883 *

  • 7/28/2019 Multivariate Theory and Applications

    31/225

    18 Chapter 2For the second data set, the parameters for the first 50 subgroups wereidentical with the values calculated empiricallyfor the four diametersi.e. Diameter 1 Diameter 4 in the first 0 observations of Case Study

    and the covariance matrix10,000 E s1.7080 1.8437 1.8529 1.84601.8195 1.8529 2.1161 1.9575L1.8264 1.8460 1.g575 2.30921 .

    The underlying distributionof the simulated data was multivariate nor-mal with the above mentioned parameters. he data were grouped n 50subgroups of size 2.Thus, the first group hadhe observationsX1= k.9976 9.9830 9.9804 14.98481Xt2= k.9553 9.9574 9.9543 14.94921

    Its mean vectorof the first subgroup is-X1= k.9765 9.9702 9.9673 14.96701

    and the covariance matrix withinhe subgroup (~10,000)s8.9558 5.4181 5,5307 7.55275.4181 3.2779 3.3460 4.55925.5307 3.3460 3.4156 4.66427.5527 4.5692 4.6642 6.369311 = [

    The computations are performed similarly or the 50 subgroups and weobtained- 50X = xxj/50= k.9864, 9.9793, 9.9752, 14.97681-

    jel

  • 7/28/2019 Multivariate Theory and Applications

    32/225

    The Multivariate Normal Distribution 19and

    50j = l

    10,000 S p = 10,000 C(50- 1)Sj/[50 * (2- )]1.2151 1.1428 1.1600 1.26691.1428 1.3559 1.3288 1.30331.1600 1.2854 1.5728 1.41831.2669 1.2792 1.4183 1.6854

    A MINITAB program which computeshe S,,-matrix can be easily de-signed if we define a macro which computesor each group its(n- )5''-matrix and sums them up. n the macro presented in Appendix 1 . 1 ,the number of parameters ( p ) is stored inK2 (in ourcase K2 = 2). Thegroup size (n) s stored inK3 and the number of groups(k) s stored inK4. The indices of the observations in he current group are from K5 toK6. The macro (named POOLED.MTE3) is given in Appendix.1 .The program POOLED.MTB, can be run by typingMTB> EXEC 'POOLED.MTE3' 1.

    The theoretical properties of the estimates X,2, S and S,,, havebeen extensively studied and were determined to be optimal under avariety of criteria (sufficiency, consistency, and completeness).The distribution properties f those estimates or a random normalsampleXI,.. X,, regiven by he following theorems (e.g. Seber,984):

    -

    (ii) If X s distributed as in i) above, thenn(X - p)'X"(X - p) x ;

    where x ; is the Chi-square distribution withdegrees of freedom.(iii) (n- 1)s W,,@ , E)

    where W,,(.,) stands for the Wishart distribution whichs consid-ered to be the multivariate analogue f the Chi-square distribution.(iv) If Z and D re independent random variables distributed respec-tively as

  • 7/28/2019 Multivariate Theory and Applications

    33/225

    20 Chapter 2

    then the quadratic form

    is distributed asT 2- f P

    f - P + lFP.f -p+]

    where Fp,f-,,+l is theFisher F-distribution with ( p , f p+1) de-grees of freedom. If the expected value f Z, denoted byp,, differsfrom zero, thenT 2 multiplied by the appropriate constant) has anon-central F-distribution with parameter f centralityp', E;lp,.(v) Let Z - N,,(O, XL) nd f D W,,(,X,), f > p where f D canbe decomposed as fD = (f- 1)Dl +ZZ', where (f- 1)Dl

    W,,(- 1, X), and Z is independent of D l. Then the quadraticform T2 Z'D"Zis distributed as T 2 fB(P, f - P)where B(p, f - p) is the central Beta distribution withf, - p)degrees of freedom, respectively.If the expected value of Z, p, differs from zero, then T 2 hasa non-central Beta distribution with parameter of non-centralityNote that unlike (iv) above, weo not assume thatis independentof D, but rather thatt is independent of D l.

    (vi) If the sam ple is composed of k subgroups of size n all originatedfrom the distribution defined above with subgroup meansj, =1 , . . . ,k and grand mean denoted byx,.e.

    P', X+,.

    -

  • 7/28/2019 Multivariate Theory and Applications

    34/225

    The Multivariate Normal Distribution 21

    /&c- Xj - X3 p(O,X)(vii) Under the conditions of (ii) above, if YI,..Yn are an extra sub-group from the same distribution, then

    (viii) If the sample is composed of k subgroups of n identically dis-tributed multivariate normal observations, andf S is the samplecovariance matrix from he j- th subgroup, j = 1, . .. k, hen

    X (n - 1)Sj W,(k(n- ) ,X)Note that this additive propertyf the sumof independent Wishartvariables, is an extension f that property which holdsor univari-ate Chi-square variables.The distributional properties f z,Sand T2provide the theoreticalbasis for deriving the distributions of the statistics used inhe multivari-awequality control procedures discussed in subsequent chapters.Those statistics assesshe overall distance f a p-dimensional vec-

    tor of observed means fromhe target valuesm (,(l), m ( 2 ) , . .,m(,,>,Let Y:), i = 1 , . .. ,n ; L = 1, .. . ,p ben multivariate measurementsand let F = (F(,), .. B@)) be observed means.Up to a constant, the appropriate Hotelling Ttatistic is com putedby multiplying Y-m i.e. the transposed vectorf deviations betweenY nd m, y the inverse of a certain empirical covariance matrixS)and then byhe vector of derivations Y-m .The Hotelling T2-statistichas already been mentionedn the previous chapter where we presentedthe chart of the values obtained from the first0 observations fromCaseStudy 1.For the case when a sampleY , . . . ,Yn of n independent normalp-dimensional observations is used for assessing the distance between

    - t )(- )

  • 7/28/2019 Multivariate Theory and Applications

    35/225

    22 Chapter 2the m e a n y and the expected values , the Hotelling 2-statistic,denotedby T i , is given by

    T i = n(v - p ) S ( v - p ) .The T i statistic can be expressed in terms of the means of themeasurements Y:) as

    where is the (a ,a)-thelement of thematrix S-1, A MINITABprogram which computes the T2-statistics has to invert the S-matrixcomputed above and o compute the quadratic formy multiplying theappropriate matrices. In Appendices .2-1.3we present several macrosfor computing T2-statistics for the various cases detailed in the nextchapters.Under the null hypothesis that the data re independent and nor-mally distributed withhe postulated mean vectorp, it follows immedi-ately from (iv) abovehat the Ti-statistic justdefined is distributed as

    When the covariance matrix was calculated from n observationsand we want to assess the distance of a single observation Y rom theexpected valuep, the Ti tatistic is obtained asT i = (Y- ) S - l (Y -P) Another additional important specialase is when p = 1.The T istatistic used for assessing the deviation f the mean of n observationsfrom the expected value reduces in this case to the square of the t -statistic t = which is the statistic sed for testing ypotheseson the mean of a univariate normal population.f the expected valuefY s p, then the t-statistic has a Student t-distribution with- 1degreesof freedom and since the coefficientf T$ reduces to1, hen t 2 F1,n-las expected.

  • 7/28/2019 Multivariate Theory and Applications

    36/225

    Theultivariatermalution 23We illustrate the computations and the hypothesis with the firstsimulated data set presented above. e mentioned that he first 50 obser-vations inhe data set simulate ata from an in-control process capability

    study. The underlying distribution was normal withp = p o = [60.05149.91

    We computed the T i values for those 50 observations and comparedthem with the appropriate critical values from the distributionfeF p , n - p , which in this case (for p = 2, = 50) re 13.97, 12.35nd6.64 or a! = .0027,! = ,005 nd a! = .05, respectively. The resultsare presented in the last column of Table 2.1.We mention that noneofthe Ti-va lues in the sample exceeded even the critical value at! = ,05.This result could have been expected, sincewe tested the distances ofthe empirical data from the location parametersf the distribution fromwhich the data were generated.The only source f observed differencesis thus random error. Inhe next sections, wehall return to his example,this time using an extended versionf the data set, which simulated botha base sample as wells some tested samples.When a samplefkn ndependent normal observationsre groupedinto k rational subgroupsof size n, hen the distance between he meanY j of the j- th subgroup and the expected values is computed by TL

    Note that unlikehe ungrouped case, the estimated covariance matrixsnow pooled fromall the k subgroups. Again, under the null hypothesesthe data are independent and normally distributed with the postulatedmean vectorp1 . It follows from (iv) and (viii) above that

    In the case of the second set of simulated data with p = 4, k = 50,n = 2, he critical values or T i are 10.94, 18.17nd 20.17 or a! = 0.5,a! = ,005 nd a! = ,0027,espectively. As we can see from Table 2.2,among the 50 subgroups, four of the Ti-values exceeded the critical

  • 7/28/2019 Multivariate Theory and Applications

    37/225

    TABLE 2.1Means with external values(49.91,60.05)-the parameters usedto generate the data.The S-matrix from the base sample50 observations). The data re the base sample:l2345678910l 112131415161718l 920212223242526272829303132333435363738394041424344454647484950

    VARl VAR2 T2#49.8585 60.0008.607349.8768 59.9865.248849.8706 60.0055 1.090449.91 17 60.0126.376749.8470 60.0165.500949.8883 60.0216.441949.91 58 60.0517.220549.91 52 60.0673.562949.9055 60.0726.204649.8969 60.0208.095149.9137 60.0928.986149.8586 59.9823.997249.9514 60.0866.988449.8988 60.0402.012249.8894 60.0720.501649.9403 60.0681.584849.9132 60.0350.286649.8546 60.0145.655449.8815 59.9982.631549.831 1 59.9963.380349.8816 60.0457 l 857949.8501 59.9860.527749.9778 60.0875.025249.8690 60.01 59 0.959649.8779 60.0055 1.295749.8680 60.0088.008849.9388 60.071l 1.228449.9133 60.0634.378549.9120 60.0560 0.105349.9250 60.0749.733749.9442 60.1 100.844649.8386 59.9725.81449.9492 60.1014.412149.9204 60.0803 S39249.8994 60.0625.524649.8703 60.0219.067849.8846 60.0271.265849.9580 60.0878.729449.8985 60.0329.193949.9397 60.0826.177849.8741 60.0061.09149.9140 60.0401.820649.9501 60.081 0 2.033049.8865 60.0169.699749.8912 60.0406.276649.9252 60.0532.898849.9326 60.0741.751349.9680 60.1219.431449.9289 80.0709.585649.9233 60.0632.3528

  • 7/28/2019 Multivariate Theory and Applications

    38/225

    TABLE 2.2Means with external values (9.9863,9 .9787,9.9743 and 14.9763)the parametersused to generate the data. he Spooled-matrix from the base sampleGroup

    12345678S10111213141516l ?19202122232425262728293031323334353837383940414243444546474950

    l a

    4a

    (SO groups of 2 observations). The data are the base sample:V A R l V A R 2 VA R 3 V A R 4 T 2 M9.9765 9.9702 9.9673 14.9670 1.84959.9936 9.9851. .. ~ ~ ~~9.98709.99629.97749.97109.98139.98179.98879.98619,9997

    10.00769.97259,99029.97869.99039,99759.98789.96799.99289.98199,98949.98919.988410.01129.97769.96939.98339.99759.984310.01079.968910.00619.9755gig9289.97899,964510.00079,98449.99249.99099.97709.99239.96969.98619.97749.97579.96969,9809

    9.97199.98669.97189.96839.97339.97749.98429.97899.99809.99699.96489,98149,97209.98939.98709.98819.98409.98399.97429.98139.98109.978310.00549.96869.97939.97619.98779.976010.00349.96199.99729.96739.98699.97359.95949.99349.97299.98309.98589.96849.98659.96519.97719.97149.97089.9621

    9.98939.96939.97779.96519.95859.97299.97319.98109,981210.00019.99399.95539.97709,96699.97889.98729.98419.97749.98019.96689.97419.97519.976510.00159.96659.97939.96889.98169.979610.00139.95829.99309.96139.98329.96519.95039.98809.96949.97889.98009.96599,98519.95879.97739.96789.96909.9602

    ~ ~~ ~ 14.981314.966114.985614.971614.963814.970614.972314.978714.976114.993514.998514.959014.978114.963014.990014.986714.981 314.975014.982614.973714.977214.978214.972215.011714.968314.960414.970514.988314.982615.005714.945214.991914.968614.987014.973914.952514.987614.973314.982214.974914.961214.988814.965014.980214.974914.970114.9630

    8.503110.25663.52003.78107.571ll 984l l 581.541 03.738210.97357.73744.86190.61393.307212.42773.81605.36702.41 780.76810.93840.77970.31784.089516.30261 a3311.49950.82882.82016.230910.846119.64938.50644.20521.73185.20749.00344.73341.29920.76264.91354.41763.00877.95591 g1935.58543.44305.91669.9781 14.9683.9667 ~~~~ 3.32829.9886 9.9809 9.97284.9796.4853

  • 7/28/2019 Multivariate Theory and Applications

    39/225

    26 Chapter 2value atQ = .05 (for the 1 th , 16th, 25th and 32nd subgroups).he Tifor the 32nd subgroup exceeded the critical values at= ,005 as well.The methods of analysis described throughout this text tacitly as-sume that he distribution of the analyzed data s fairly well representedby the multivariate normal distribution. Procedures or testing the hy-pothesis of multivariate normality include both analytical methods andgraphical techniques (e.g., Mardia, 1975; Koziol, 1993). Whenhe datapresent clear evidence of lack of normality, we may try to transformthe data using, for example, a multivariate generalization of the Boxand Cox (1964) transformation. Inhe univariate case, the Box and Coxtransformation improves he adherence to normality f a variable X byfinding an appropriate constanth and transformingX into X(A )where

    On the method for finding h see Box and Cox (1964) or Box, Hunterand Hunter (1978). Andrews (1971) generalized this approach to themultivariate case by introducing a vectorh = (AI, . , .h,,) and the cor-responding transformeddata vector(X(A1)I ..(

    which is constructed so that X@) s approximately distributed s N p p,E). n the univariate case we can find a transformationg(X) that willtransform data to approximate normally up to any level of precision.One should note, however, that in the multivariate case an appropriatetransformation might not exist [Holland1973)l.We thus continue to considern the following chaptershat eitherthe original data are approximately normally distributed, or that thedata have been preliminarily transformed and, on the new scale, theobservations have, approximately, a multivariate normal distribution,

  • 7/28/2019 Multivariate Theory and Applications

    40/225

    3QuaLip ControL witb

    Externally Assigned TargetsObjectives:

    The chapter provides methods f analysis f i r the c a e in whichthe targets are assigned external4 and are not based on dataanalysis. Situations in which external targets are appropriateare discussed and the computations involved are presented indetail wing simulated data.

    Key Concepts0 Target values0 Multivariate process control0 Multivariate hypothesis testing0 External d u e s0 Tested sample0 Data grouping0 Decomposition of the Hotelling l?-distance

    27

  • 7/28/2019 Multivariate Theory and Applications

    41/225

  • 7/28/2019 Multivariate Theory and Applications

    42/225

    Qualityontrolithxternallyssignedargets 29We mentioned in Chapter hat the statistics involvedn multivari-ate process control depend on the origins of the target valuesm. Themethodology and the applications presented in thishapter are for the

    case when the target valuesare assigned externally, i.e. from anxternalrequirement or when a prespecified standard0 has tobe met. It shouldbe noted that com parisons with externally specified targetsre not usu-ally considered in statistical process control since such targets do notaccount properly for valid process characteristics. However, whenevermultivariate prespecified standards re set, multivariate quality controlprocedures can be used to assess whetherhe multivariate meansof theproducts equal the external targetso. The next two chapters eal withcases in which target values are derived and com/puted internally. Weshall distinguish between targets derived fromhe tested sample itself,and targets which are derived from a reference or base sample. Ifwe denote by p, the multivariate meanof the productsY, he statisticalanalysis in quality control with external targetss equivalent to testinghypotheses of he form

    H0 :p = m0

    Ha : p # m o .againstGiven a sample of size nl from the population, we can computeTi = n l v- m0)S-I (v- mo). We have seen in the previous chapterthat for normally distributed data,f m0 is the expected value f v, henthe statistic T i has a distribution given by 0 = = T i - plnl-p.

    Thus, under the null hypothesis mentioned above,0 Fp1,nI-p. UnderHa,0 has a non-central -distribution. The critical value or T i s thus

    where is the upper 100%percentile of the central F-distributionwith ( p ,nl - p ) degrees of freedom. If the value of Ti exceeds thecritical value, we conclude that the derivations between and the ex-ternal targetsm0 cannot be explained by random error. In such cases anassignable cause affecting the process is to be suspected.

  • 7/28/2019 Multivariate Theory and Applications

    43/225

    30 Chapter3As a first example we shall use an extended version of the firstsimulated data set presented in the previous chapter. Now,n addition tothe base sam ple of 50 observations presentedn Chapter 2, we gener-ated 25 bivariate observations which simulate a seriesf tested samples,whose parameters are as follows: the parameters of observations 51 -55 are identical to thosef the base sample (i.e. theata are in-control);the population mean or the first componentn observations56 - 65 hasbeen shifted upward y two standard deviation i.e.

    finally, in observations6 - 75, the population meanf the first compo-nent was shifted downward and thatf the second upward, both by nestandard deviation, i.e.

    We recall that the first 50 bivariate observations were generated fromthe normal distribution withhe mean vector [ 0.05 1 49.91

    We shall consider two sets of target values: the first set satisfiesmi)=po, i.e. the means used or generating the data are aken as target values;in the second set, the target valuesre

    m f ) = [60,0]49.9We start by testing he hypothesis that the generatedata originate fromthe population with the prespecified targets, i.e.= m!), If we furtherassume that the covariance matrix is known (as it is indeed the case inthe simulated data), we can compute or each observation inhe testedsample the statistic

    (YI - nit) )W 1(Y I -m!)) , i = 51,. . .,75

  • 7/28/2019 Multivariate Theory and Applications

    44/225

    Qualityontrol with Externallysslgnedargets 31and compare those values with the critical values fromhe chi-squareddistribution with two degreesf freedom (see property (vi) in Chapter 2).Alternatively, the covariance matrixC can be estimated from he"base" sample by the empirical covariance matrix . The resulting T istatistics have then toe compared with the critical values based n theF-distribution which are presented in this chapter. e now perform foreach of the 25 observations from the tested sample, (i.e,ingle observa-tions Y) oth the multivariate T2-test based on the Ti-statis tic as wellas the univariate t-test for each of the two components.We recall thatwe performed in Chapter 2 the corresponding multivariate testsor thefirst 50 observations from he "base" sample.In the multivariate test, the null hypothesiss that the mean of thespecific bivariate observation being tested satisfies:

    m=,:)=[ 49.910.051 ,while inheunivariate test, the hypothesesare stated for each componentseparately. The critical values for T i from the distribution of F 2,48are 13.693, 12.103and 6.515 for a! = .0027, a! = ,005 nd a! = .05,respectively. The corresponding critical values or the two sided t-testare It1 = 3.16,2.94and 2.01 for the values of a! as above.The results are presented in Table3.1,For each f the three subsets f the tested sample (ofizes5 , 10and10,respectively), we summarize in Table3.2 he number f observationsfor which the null hypothesis has been rejectedy the three statistics.

    The first subset of five observations, whichre in control, behave asexpected and or none of them, theTi xceeded the appropriate criticalvalues. In the second subset of 10 observations, only the mean of thefirst component has been shifted (by two standard deviations). e seethat the empirical power f the multivariate T i test exceeded hat of theunivariate test performed specifically on the affected component. Thesingle observation for which the second component exceededhe criticalvalue ata! = .05 is to be considered a typeerror since ts mean has notbeen shifted.Let us now consider the second set of target values (which aredefined as the means used to generate he data but this time with onlyone significantdigit), i.e.

  • 7/28/2019 Multivariate Theory and Applications

    45/225

    32 Chapter 3TABLE 3.1Means with external values 49.91,60.05) which are the parametersusedto generate the data. The S-matrixfrom the base sample 50 observations).The data are the tested samples:

    VARl VAR2 t - VARl t - VAR2 TZM51 49.87980.0417 -0.8654 -0.2258.609252 153145556575859606162636465666768697071727274

    49.920849.960649.949849.839049.928449.964849.978050.021850.060650.036549.975649.984050.002849.977049.857949.899749.915649.925849.838449.893749.863149.940649.9046

    60.029260.117260.054359.966560.007960.048260.018660.085460.139960.100560.038760.085760.048260.027860.058860.082060.141560.113260.044960.089360.075760.129860.0739

    0.31001.44901.l 88-2.03180152721S704l g4653.20244.31273.6221l a7812.1 1862.65841 g174-1.491 1-0.29380.16000.4538-2.0487-0.4657-1.34170.8750-0.1554

    -0.56351.a1640.1 153-2.2584-1.1400-0.0483-0.85060.95622.43141.36510.9646-0.3067-0.0488-0.60020.23840.86532.47501.70821.06190.69602.1 5890.6463

    -0.1381

    3.65683.93865.63594.407412.620512.857335.1 90527.977629.389030.364422.34019,374634.828828.917812.20435.462724.23807.870115.60469.533316.83269.55222.594675 I 49.8718 60.0676 -1.0931 0.4761.831 0

    = L0.01 ,49.9For the first 50 observations from the base sample as wells for thefirst five observations in the tested sample, the targets deviate fromthe presumably unknown means f the bivariate distribution by .27~1and -1 .35~2, respectively. The deviations are more substantial for theother tes ted observations.The results of the testing or the 50 observations in the base sam-ple with the covariance matrix being estimated by S, are presented

  • 7/28/2019 Multivariate Theory and Applications

    46/225

    Quality Control with Externally Assigned Targets

    0 0

    0 0

    0

    0

    0

    0

    o r 4 0

    o m 0

    o t - m

    33

  • 7/28/2019 Multivariate Theory and Applications

    47/225

    l2345678910l 112131415161718l 920212223242526272829303132333435363738394041424344454647484950

    TABLE3.3Meanswith external values 49.9,60.0)'. The S-matrix from the ase sample(50 observations).The data are the base sample:VARl VAR2 t - VARI t - VAR2 TZM

    49.85850.00081 .l81.0221 1.607349.876849.870649.91 1749.847049.888349.915849.915249.905549.896949.913749.858649.951449.898849.889449.940349.913249.854649.881549.831149.881649.850149.977849.869049.877949.868049.938849.913349.912049.925049.944249.838649.949249.920449.899449.870349.884649.958049.898549.939749.874149.914049.950149.886549.891249.925249.932649.968049.9289

    59.986560.005560.012660.016560.021660.051760.067360.072660.020860.092859.982360.086660.040260.072060.068160.035060.014559.998259.996360.045759.986060.087560.015960.005560.008860.071 160.063460.056060.074960.110059.972560.1 01 460.080360.062560.021960.027160.087860.032960.082660.006160.040160.081060.016960.040660.053260.074160.121960.0709

    -0.6646-0.841l0.3349-1 S16 9-0.33390.45360.43540.1576-0.08970.3913-1.18611.4729-0.0342-0.30221.15500.3788-1,2992-0.5303-1.9721-0.5279-1.42972.2265-0.8881-0.6324-0.91561.11040.38210.34320.71511.26591.40900.5831-1.7595

    -0.0177-0.8496-0.44141B5981.l65-0.74280.40201.4337-0.3857-0.25150.72210.93351 .g4760.8285

    -0.0428

    -0.36390.14820.34000.44730.58561.39741.8202lg6250.56212.5109-0.47782.34341.08661.94791.84340.94790.3922-0.0485-0.10071.23552.36570.42940.14930.23811 Q2241.71441.51502.02512.9759-0.74272.74362.17321.68980.59140.73232.37420.88962.23420.16511.08512.19070.45811.0989l43922.00373.29711.9185

    -0.3777

    4.24881.09045.37674.50090.44190.22050.56292.20461.09514.98612.99721 g8840.01225.5016lS8401.28662.65542.631 55.38031.85792.52777.02520.95961.29571.00881.22840.37850.1 0530.73373.84463.81 542.41211.53921.52461.061 80.26582.72940.19391.l781.09170.82062.03300.69970.27660.89880.75134.43140.585649.92330.0632 1.7102 0.3528~

  • 7/28/2019 Multivariate Theory and Applications

    48/225

    Qualityontrolithxternallyssignedargets 35

    51526354555657585960616263646566676869707172737475

    TABLE3.4Means with external values(49.9,60.0).he S-matrix from the base sample(50 observations). The data are the tested samples:VARI VARZ t - VARI t - VARZ T2M49.8798 60.0417 -0.5791 1.1267 1.609249.9208 60.0292 0.5962 0.7889 3.656849.9606 60.1 172 1.7352 3.1689 3.938649.9498 60.0543 l 4250 1.4677 5.635949.8390 59.9665 -1.7455 -0.9059 4.407449.9264 60.0079 0.81 35 0.2125 12.620549.9648 60.0482 l 8567 l 3041 12.857349.9780 60.0186 2.2327 0.5018 35.1905

    50.0218 60.0854 3.4887 2.3086 27.977650.0606 60.1399 4.5989 3.7838 29.389050.0365 60.1005 3.9083 2.7175 30.364449.9756 60.0387 2.1644 1.0458 22.340149.9840 60.0857 2.4049 2.3170 9.374650.0028 60.0482 2.9447 1.3036 34.828849.9770 60.0278 2.2037 0.7522 28.917849.8579 60.0588 -1.2048 1S908 12.204349.8997 60.0820 -0.0075 2.2177 5.462749.9156 60.1415 0.4463 3.8274 24.238049.9258 60.1 132 0.7401 3.0606 7.870149.8384 60.0449 -1.7624 1 -2144 15.604649.8937 60.0893 -0.1795 2.4143 9.533349.8631 60.0757 -1.0554 2.0484 16.832649.9406 60.1298 1.1613 3.5113 9.552249.9046 60.0739 0.1308 1 Q988 2.594649.8718 60.0676 -0.8068 lB285 9.8310

    in Table 3.3. The corresponding results for the tested sample canbefound in Table 3.4. The computations were performed by the macrospresented in Appendix 1.2. Those macros compute the relevant statis-tics for the analyses of ungrouped data presented in Chapters3-5. Theprogram with he macros that compute the statisticsor ungrouped datacan be run by typing MTB>EXECUU.MTB1.The summary m ble 3.5 presents the number of observations forwhich the null hypothesis has been rejected bothor the base as wellas for the various tested samples.We observe again hat the success of the univariate tests to detectdeviations is much smaller thanhat of the multivariate test.

  • 7/28/2019 Multivariate Theory and Applications

    49/225

    36 Chapter 3

    o o m o

    O O W O

    - 0 0 0 0

  • 7/28/2019 Multivariate Theory and Applications

    50/225

    Qualityontrolithxternallyssignedargets 37We mentioned in the introduction that the Ti tatistic does notprovide the important information on he variable(s) which caused heout-of-control signal.On he other hand, we have seen in the example

    above that the power of the univariate-test is lower than thatf the mul-tivariate test. Multiple univariate testings have a compounded problemwhen the significance levels assessed, due tohe multiple comparisonsissue. In Chapter 7we present other methodsor detecting the outlyingvariables. These methods are extensions of the multivariate testinganduse the information from ll the variables and their covariance structure.As a second examplef external targets we usehe real data fromCase Study3, where the variables re dimensions of several lots of ce-ramic substrates.Raw materials used inhe manufacturing of hybrid mi-crocircuits consist of components, dyes, pastes, and ceramic substrates.The ceramic substrate plates undergo a process of printing and firingthrough which layers of conductors, dielectric, resistors, and platinumor gold are added to the plates. Subsequent production steps onsist oflaser trimming, component mounting and reflow soldering, orhip en-wire bonding.The last manufacturing stage is the packaging and sealingof the completed modules. he ceramic substrates re produced in ots ofvarying sizes.The first production batch (labeled Reference) proved tobe of extremely good quality yielding an overall smooth production withno scrap and repairs. This first lot was therefore considered a 'standard'to be met by all following lots. Five dimensionsre considered inCaseStudy 3, with labels (a,b , c , W , ) . The first three are determined bythe laser inscribing process he last two are outer physical dimensions.In our next example we consider only the first setf three dimensionsfrom the table from Appendix 2.The Reference amplehas veragedimension, inmm, for(a,b , c ) ~ ( 1 9 9 , 50.615, 550.923). The engineering nominal speci-fications are (200,550,550). By redefining the (a,b , c ) measurementsas the deviations from the nominal specification target values,naturalnull hypothesis to test on the three dimensional population mean is:

  • 7/28/2019 Multivariate Theory and Applications

    51/225

    38 Chapter 3TABLE 3.6Measurements in reference sample (Case tudy 3)

    a CMeans (v) -1.000 0.615.923S"-matrix 1.081 0.100 -0.432

    0.100 1.336 0.063-0.432 0.063 2.620

    against

    The means (with respect tohe center of the nominal specifications)and the S" matrix of the Reference sampleare as shown in Table3.6.Since the Reference sample has13 units we have hat n = 13 andT i = 13(V - )'s" (V- 0) 1.081 0.100 -0.432 -1.000= 131:-1.00 0.615 0.9231 * 0.100 1.3360.063 * 0.6150.063 2,6201 [ .9231

    = 13[-1.427 0.793 2.8901*= 13 X 4.58= 59.54

    p%]0.923Now since = 27.23 wehaveoeject H0 at 1% levelof significance. Therefore we conclude that, on the average, althoughthe Reference sample wasof extremely good quality itid not meet therequired nominal specifications.Kenett and Halevy(1984) investigated the multivariate aspects of

    military specifications documents publishedby the US Department ofDefense. They show that a multivariate approachs indeed required orsetting the inspection criteria of such products, since most product char-acteristics tend to be correlated. Again, such standards yield externallyassigned targets hat are used to monitor actual production.

  • 7/28/2019 Multivariate Theory and Applications

    52/225

    Qualityontrolithxternallyssignedargets 39Another source of externally assigned targets s the developmentprocess of a productor process. In such a development process one typ-ically accounts for customer requirements, internal technical capabili-

    ties, and what the competition offers.he Quality Function Deployment(QFD) matrices mentioned in Chapterelow are used to integrate thesedifferent sources f information yielding agreed upon targetshat are setso as to create "selling points" for the new product or process (Juran,1988). Such targets are setn several dimensions and i can be used todetermine if these targetsare met using the techniques presented here.Grouping the Data.

    If the tested sample f size n is grouped in rational subgroups fsize n j ,n = C nj the empirical covariance matrixs a pooled estimatefrom the k sample covariance matrices calculated from each subgroups,

    kj = l

    k

    Fuihermore, if the subgroup sizesare all equal, i.e.nl = kn thenthe pooled covariance matrixS, is the average of the k individual ma-trices,

    When we now test the mean of a subgroupof n observations, the teststatistic is defined as

    where Yj is the mean of the n observations in the subgroup. The criticalvalue for testing the hypothesis that he j- th subgroup does not deviatefrom the targets m0 by more than random variations given by:

    The grouping of the data also enables uso compute a measuref internalvariability within the 'subgroup: For the j- th subgroup the measure ofvariability is given by

  • 7/28/2019 Multivariate Theory and Applications

    53/225

    40 Chapter3

    where Yij is the i-th observation in the j- th subgroup. When consideringall the observations of the j- th subgroup, relative tohe targets mo, oneobtains a measure f overall variability T$ defined as

    From basicalgebra we have that

    The critical values for T i , can be approximated byU C L = (n - 1)x,2(a)

    where x, (a)s the 100 a-th percentile of the chi-squared distributionwith p-degrees of freedom (see, .g. Johnson,l985). The approximationis based on the substitutionof S, by E n the formula for the computationof T i , .To illustrate the use of the test statistics we se anextended versionof the second simulated ata set with grouped ata presented in Chapter

    2. In addition to the first 100observations(50 subgroups of size two), wegenerated 90 additional observations grouped in 45 subgroups of sizetwo whose distributions weres follows: the parameters of observations101-110 are identical to thosef the base sample;he population meanorthe first component of observations 110-130 have been shifted upwardby two standard deviations, i.e.10.013314.9763

  • 7/28/2019 Multivariate Theory and Applications

    54/225

    Qualityontrolithxternallyssignedargets 41in observations 13 1-150, the population mean of the first componentwas shifted downward and that f the second component upward, bothby one standard deviation, i.e.

    - 6 1 9.9728c13=c10+ [+f = [ 9.9923.g743] 14.9763

    In the fourth tested sample (observations51-170) the shift was by nestandard deviation in theirst component with the sign of the deviationalternating in consecutive observations as follows: in the observationswhose case num ber is odd, the mean of the first componentwas shifteddownward, while in the observations whose case number is even, theshift was upward. Thus when thedata are grouped in pairs, the last 10groups formed from observations 15 1-170 have average mean whoseexpected values re as in he base sample, buthe within group devia-tions are expected to be large. Finally,he population means or each ofthe four components in observations 171-190were shifted upward byone standard deviation, .e.

    9.9998i] [ 4.9915 9.9923.98881The target values were set at

    which deviate fromhe population meansbyr 0 . 4 7 ~ ~

  • 7/28/2019 Multivariate Theory and Applications

    55/225

    42 Chapter 3We performed the tests for the 75 groups and compared the resultingT i s and Tis with the critical values atQ = .05,which are 10.94 nd9.49 espectively. The results are presented in Tables 3.7 and 3.8. Thecomputations were performed by the macros presented in Appendix1.3. Those macros compute the relevant statistics for the analyses ofgrouped data presented in Chapters 3-5. The program with the macroswhich com pute the statistics for grouped data can be run by typingMTB>EXEC GG.MTB 1.We summarize in Table 3.9 the number of groups for which theappropriate T i and Ti-values exceeded the critical values inhe basesample as well as in the four tested samples.We can see from the table that even or the relatively small devi-ations between the targets and the actual means (by less than . 5 ~oreach variable), the power of the multivariate test is about 60% for thegroups in which the population means were not shifted. he detectionis 100% in the second and the third tested samples and he power isagain about 60%-70% inhe last two tested samples. Note that inhelast tested sample all the means were shifted by one standard devi-ation, but they were all shifted in the same direction. The probabilityof detection of such a shift is considerably smaller than in he case inwhich only two ut of the four components are shifted by one standarddeviation, but the shifts are in opposite directions. Obviously, a shiftby one standard deviationof all four components inpposite directions(say in pairs) would have yielded even largeri-values.The results in the column which summarizes the results for theTi-tes ts illustrate the power of the test to detect within group variationin the fourth tested sample.

  • 7/28/2019 Multivariate Theory and Applications

    56/225

    TABLE .7Means with external values (9.98,9.98,9.98and 14.98)'. The Spooled-matrix from thebase sample(50 groups of 2 observations). The data are the ase sample:Group

    l2345679101112131415161718l 92021222324252627293031323334353637384039414243444646474960

    a

    2a

    4a

    V A R l V A W VA R3 VA R4 T2M T'D T'o9.9765 9.9702 9.9673 14.9670 6.9126 11.5787 18.49179.99369.98709.99629,97749.97109.96139.98179,98879.98619.999710.00769.97259.99029.97868.99039.99759,98789.98799,99289.98199,98949.98919.988410.01129.97769.98939.98339.99759.964310.01079.966910.00619.97559.99289.97899.964510.00079.98449,99249.99099.97709.99239.96969.96619.97749.67579.96969,9809

    9.96519.97199.96669.97189.96639.97339.97749.96429.97899,99809,99699.96489.98149.97209.98939.98709.96819.96409.88399.97429.98139.96109.978310.00549.96869.97939.97619.98779.976010.00349.96199.99726.96739,96698.97359,95949.99349.97299.96309,98589.96849.96659.96519.97719.97149,97069.96219.9761

    9.98939.96939.97779.96519,95859.97299.97319.98109.981210.00019.99399,95539.97709.96699.97689,96729.98419.97749.88019.96889.97419.97519.976510.00159.96659.97939.96889.98169.979610.00139.95829.99309,96139.98329.96519.95039.98809.96949.97889,98009.96599.98519.95879.97739.96789.96909.9602

    14.981314.966114.985614.971814.963614.970614.972314.978714.976114.993514.998514.659014.978114,963014.990014.966714.981314.975014.982614.973714.977214.976214.972215.011714.966314.980414.970514.988314.982615.005714.945214.981914.968614.987014.973914.952514.987614.973314.982214.974914.961214.986814.965014.980214.974914.970114.9630

    18.405635.116518.87376.76579.1 5759.33005.91627.80179.820011.398122.642316,522313.562914.967910.504016.60404.362011.877412.40148.469214.107612.077420.114119.560110.86759.189712.717116.26345.554119.452336.072926.20169.88666.83938.444215.326119.486914.315512.916516.882718.35185.09767.39167.17175.00163.40508.0880

    2.61 102.37877.63553.11505.17781.83404.27860.20001.65354.68276.64400.97965.40592.77586.11682.49320.16465.16913.61902.26320.45227.63794.63823.57332.2299l 89492.42183.57600.92831.08661.14803.33855.26344.07667.75423.644814.60150.81687.64393.00038.53477.00512.19271.12717.96344.19855.0988

    21.017537.496126.50849.880314.3352l .l 5010.19416.000311.474216.080529.266417.502818.966717.742616.619919.09724.546417.045716.020210.733614.558924.749819.715823.1 33411.418812.783615.138519.85826.462520.540137.221 129.540315.170610.916916.198719.171434.090415.132620.561 121.350925.41 7712.10228.51889.364712.96457.604213.18619.9667 14.9683 11.2887 0.7487 12.03759.9886 9.9809 9.9728 14.9796 11.9922 2.4071 14.3999

  • 7/28/2019 Multivariate Theory and Applications

    57/225

    TABLE .8Means with external values (9.98,9 .98,9.98 and 14.98). The Spooled-matrix from thebase sample (50 groups of 2 observations). The dataare the tested samples:Group VA R lA R 2A R 3A R 4 T2M T20 T20

    S152 9.9752 9,9631 9.9629 14.9669 13.4041 4.3124 17.71529.9808 9.9748 9 .9664 14.9717 9.4387 9.0740 18.51314 9.9825 9.9780 9.9706 14.9844 10.2608 5.1232 15.383539.9948 g.9782 9.9728 14.9831 29.1965 1.1444 30.3390

    66 10.0136 9.9778 9,9720 14.9673 194.9893 2.4845 197.46766 9.9986 9.9819 9.9791 14.9747 49.8481 1.3336 51.180657

    10.0282 10.0023 9.9959 14.9947 130.0325 6.7462 138.77873 9.9974 9.9655 9.9548 14.9616 126.4785 5.2893 131.7633210.0094 9.9789 9.9695 14.9769 118.8845 7.5874 126.47021 10.0124 9,9818 9.9746 14.9797 121.0134 2.5841 123.5975010.0184 93 87 2 9.9854 14.9924 110.2353 1.0674 111.30399 10.0189 9.9765 9.9786 14.9802 198.7654 15.6996 214.4651810.0068 9.9729 9.9661 14.9723 134.1977 12.4554 146.6554

    64g.9817.9977 9.9808 14.9881 34.8933 5.4240 40.31676 10.0042 9.9657 9.9550 14.9619 180.6132 3.7681 184.38916

    10.0127 9.9794 9.9784 14.9758 145.0159 2.4281 147.4440

    67

    9.9727 9.9880 9.9719 14.9796 33.2976 5.5188 38.81473 9.9784 0.9907 9.9781 14.9767 20.5829 3.6764 24.260128.9722 9,9902 9.9738 14.9736 34.7953 3.0990 37.89821 9.9711 9.9946 93 75 9 14.9797 57.9523 7.0179 64.967909.9768 9.9922 9,9809 14.9819 23.3029 8.5072 29,61009 9.9839 9.9832 9.9709 14.9621 36.0596 9.1947 45.250689.9650 9.9865 9.9723 14.9711 43.3239 5.9636 49.2889

    74767677787980828183848686878889909192939495

    9.9583 9.9834 9.9624 14.9646 69.4106 3.3428 72.75179.9877 10.0065 9.9825 14.9964 63.4783 3.8257 67.30569.9769 9.9687 9.9704 14.9653 11.8044 36.0720 47.8761B.9789 9.9682 9.9590 14.9656 19.2991 30.5845 49.88229.9901 9.9798 9.9719 14.9830 15.7072 43.8885 59.39509.9720 g.9648 9.9620 14.9607 10.3572 22.2830 32.64019.9736 9.9625 9.9596 14.9666 12.9881 14.8953 27.88349.9839 9.9759 9.9705 14.9793 8.4083 10.2363 18.64379.9819 9.9740 9.9705 14.9731 7.8902 18.0869 25.97649.9916 9.9845 9.9804 14.9792 12.9792 24.5291 37.50929.9851 9.9818 9.9718 14 .97~ 13 8.5208 28.8474 37.3676~~~9.9715 9.9642 9.9610 14.9553 17.5724 21.9927 39.56649.9962 9.9875 9.9802 14.9804 22.3014 3.9807 26.2821. . ..

    10.001510.005610.00859.99459,988110.003910.00038,98789.9972

    9.996010.005910.00209.98389.98309.991510.00199.98309.9861

    9.9925 14.9925 13.7984 5.4804 19.27939.9992 15.0061 14.0653 5.7841 19.848610.0011 15.0012 18.2087 9.0957 27.30369.9811 14.9775 24.4965 1.2341 25.73119.9783 14.9786 7.5604 5.6353 13.19589.9921 14.9909 26.0642 2.8166 28.88029.9945 14.9986 10.5187 1.3998 11.91839.9778 14.9756 10.7594 0.1619 10.92149.9894 14.9893 15.1882 17.4943 32.6826

  • 7/28/2019 Multivariate Theory and Applications

    58/225

    Qualityontrolithxternallyssignedargets 45TABLE3.9Number of times that T i and Ti xceeded the critical values

    TA Ti Number of groupsGroups 1-50- "Base" sample 29 2 50Groups 5 1-55-First "tested" sample 3 0 5Groups 56-65 - Second "tested" sample 10 2 10Groups 66-75 -Third "tested" sample 10 1 10Groups 76-85 -Fourth ''tested" sample 6 10 10Groups 86-95 -Fifth "tested" sample 7 1 10

  • 7/28/2019 Multivariate Theory and Applications

    59/225

  • 7/28/2019 Multivariate Theory and Applications

    60/225

    4Quulity Control with

    Internal Targets-MultivariateProcess Capubility Studies

    Objectives:The chapter provides examples ofprocesscapability studies car-ried outon multivariate data. The dzferent steps in p e f i m i n gsuch studies m e illustrated A m u h or setting control limits aswell asmultivariate capability indices computedfiom the ana-lyzeddata are presented The chapter also resentsguidelinesfirinterpreting T 2-charts and capability indices.

    Key Concepts0 Refkrence sample0 Process capability study0 Statistical Control Charts0 Multivariate process capability indices0 Internal targetswith the Leaveone out approach0 Data grouping0 Base Sample

  • 7/28/2019 Multivariate Theory and Applications

    61/225

  • 7/28/2019 Multivariate Theory and Applications

    62/225

    Multivariaterocessapabilitytudies 49During an ongoing industrial process, quality controls typicallyperformed with target values derived from a standard referenceamplewhose units have been determined toe of acceptable quality onll the

    analyzed variables. However, at the initial stage of a new or changedprocess, a thorough investigationf its characteristics and capabilitiessrequired. Ina process capability study, there typicallyre no preexistingdata on the process characteristics. In particular, no target values basednprior information from he production of the component are available,and the target values therefore have to be calculated internally. It isimportant to distinguish between product specification limits whichrederived from customer needs, and process quality characteristics whichdepend uponhe processes involved inhe production and deliveryf theproduct. Univariate process characteristicsre typically given n termsof an average, estimating the process mean, and a standard deviation,estimating the process variability. In he multivariate case the vector ofmeans replaceshe univariate process mean andhe correlations betweenvariables are added to the variables standard deviationsor variances.The Western Electric Statistical Quality Control Handbook 1956)defines the term process capability study as ...the systematic studyof a process by means of statistical control charts in order to discoverwhether it s behaving naturally or unnaturally; plus investigationf anyunnatural behavior to determine ts cause, plus action o eliminate anyof the unnatural disturbance.Process capability studies re therefore much more than aimpleexercise of just estimating the means and he covariance matrix.Som e of the steps in amultivariateprocess capability study include:

    1. Determination of the boundaries of the process selected toe stud-ied and of the variables characterizing the process outputs. Thisstep is typically the responsibility f management.2. Determination of representative timerames for data collection todefine the sampling process andhe rational subgroups. he designof the data collection system has toe tailored to the variables char-

    acterizing the process outputs.This step requires in-depth knowl-edge of the process as it is actually being operated and f futureneeds and plans that might affecthe process.

  • 7/28/2019 Multivariate Theory and Applications

    63/225

    50 Chapter 43. Performance of a cause and effect analysis linking processutputscharacteristics o internal parameters and control factors.he toolsused in his phase can include the simple and efficient fishbonei-agram or the more comprehensive Quality Function Deployment(QFO) atrices. The analysis may be further refined and con-firmed with statistically designed experiments. At this tage onecan conduct experiments using conceptsf robust design n orderto make he best use of control factors as countermeasures to noisefactors affecting the process. For more detailson robust designssee, for example, Phadke (1989), and Kenett and Zacks (1997).4. Data collection and analysis using univariate controlhart on theindividual variables, multivariateontrol charts on a combinationof variables, and various statistical and graphical methods to in-vestigate the structure displayedby the data.5. Elimination of specialauses of variability. Thistep requires eval-uation of stability of the process and an identification f sourcesof variability such s measurement devices, work shifts, operators,

    batches or individual item.6. Evaluation of the underlying probability modelor the process, in-cluding checking for multivariate normality. When needed, trans-formations may be used o induce normality.7. Computation of process performance indicesr process capabilityindices. Process performance indices representa measure of his-torical performance andrecalculated using ll of the data without

    considering stability over time. They re used to summarizepastperformance. Process capability indices re computed when datais collected over a short time frame (say 30 observations) andare used as predictors of process capability. Such indices requireprocess stability, epresentative samples, normality of he processdistribution and independence between the collected observationsRecently several authors suggested using multivariate processa-pability indices (e.g. Chan et al., 1991; Wierda, 1993).The multivariate process capability study is a classical situationin which quality control is performed with internal targets.The vector

  • 7/28/2019 Multivariate Theory and Applications

    64/225

    Multivariate Process Capabilitytudies 51of target values m s computed from the data (after the exclusion ofoutlying subgroups), and each observation r mean of subgroup in theprocess is compared with those target values.

    When the data are ungrouped and thempirical covariance matrixS s based on he entire sample of nl observations, in he T i statistic forthe i-th observation given by

    the statistics Yi-Y and S re not independently distributed. Howeverit can be shown that (n- 1)s can be decomposed as (n- 1)s = (n- 2)S1+(Yi-7)(Yi v)' such that (n- )S1 has a Wishart distribution andS1 is independent of (Yi- Y).See e.g. W ierda (1994).Therefore, thedistribution of Ti does not follow anymore(up to a constant) a FisherF distribution but rather a beta distributionseeproperty (v) in Chapter2). The appropriate UCL is given by

    ( 3 -

    where & ( a , .) is the upper100 a-th percentile of the beta distributionwith the appropriate numberof degrees of freedom.If we ignore the dependence between(Yi-9 nd S,we obtain

    (as suggested or example by Jackson (1959),Ryan (1988),Tracy et al.(1992)) an approximated UCL given by

    However, lately Wierda 1994) ompared the valuesf UCL and UCL*for a! = ,005; p = 2,5,10 nd for various values of n1. He found thatthe differences between UCL and UCL*re very substantial and he useof the approximated critical value is clearly unwarranted.It is however possible to define for the capability study T2-typestatistics for which the statistics which measurehe distance of the testedobservation from the mean are independent of the estimated covariance

  • 7/28/2019 Multivariate Theory and Applications

    65/225

    52 Chapter 4matrix and whose critical values are based on the percentiles of the F-distribution.W O ain alternativeswere suggested in he literature: Thefirst approach suggested by Wierda1994) who considers

    where only the covariance matrix is based on the nl - 1 observationsbut7 s based on the entire sample of sizen . The critical value or thisThe second approach was suggested byBruyns (1992) who considersthe "Leave One Out" method and defines he statistic

    wherey(+, &i) are he vector of the means and the covariance matrix,respectively, calculated from all but the i-th observation.-

    The critical value for that statistic s

    It can be shown that for each observation i, TZ can be obtained as afunction of T i as follows:

    This relationship can be very useful since it enables us to compute TZwithout recomputing the covariance matrixfor each observation sepa-rately. The critical values for T i are based on he F-distributions whichis more readily available thanhe beta distribution. Moreover,ince thefunction which relates T i i with T i i is monotonic, the critical values canbe obtained directly by the relationship. Thus

  • 7/28/2019 Multivariate Theory and Applications

    66/225

    Multivariate Process Capability Studies 53

    Wierda (1994) presents the three statistics and mentions that he findsthe regular T i preferable, since on one hand it resembles more closelythe Ti-statistics computed in other situations and on he other hand itavoids the need to compute several timeseither the covariance matrix(for T i ) or both the covariance matrix and he mean vector (for TZ).The disadvantage is obviously the need to appeal to the critical valuesfrom the beta distribution.For methodological practical reasons we recommenda differentapproach than hat suggested by Wierda.Since, we use T i with internaltargets in the process capability study, in our opiniont is advisable tocompare each observation with atatistic on which hat observation hadno effect, i.e. the Leave One Out approach. he relatively cumbersomecomputation of y(-il and S(-i) or each i, can easily be implementedin this age of fast computers. The use of the critical values based onthe F-distribution can be considered in thiscase an extra bonus butcertainly nota crucial consideration.While as mentioned, the Leave One Out approachs preferablefrom the methodological point f view, in ractical conditionshe differ-ences between the two methodss seldom noticeable. Unlesshe numberof observations is very small, the effect of a single observation on thecomputed statistics s rarely such that theutlier is detected byhe LeaveOne Out approach and not detected byA.For an illustration of the methods, let us now return to the firstsimulated data set with ungrouped ata presented in Chapter 2. We con-sider only the first 50 observations as simulating data derived from aprocess capability study with unknown parameters.We recall that thefirst 55 observations were generated fromhe distribution with he same(PO,X)while the means of the next 20 observations were shifted. e ofcourse assume that at this testing stage, he underlying parameters areunknown to the investigator, and proceed to est each observation sep-

    -

  • 7/28/2019 Multivariate Theory and Applications

    67/225

    54 Chapter 4arately versus the empirical overall mean of the first 50 observations.The covariance matrix s also estimated from he base sample.We present in Table.1 he values of the statistics Ti, Z nd T2respectively. The statistics were computed by the program whose mainmacro is U U . M T B from Appendix 1.2. Note hat we have in fact onlytwo distinct methods sincehere is a one-to-one correspondence betweenTi nd Ti. he critical values for a! = .05 are 5.76,6.40 nd 6.66 ndfor a! = .005 they are 9.69, 11.90 and 12.39, respectively. The criticalvalues for Ti re based on he beta distribution, whilehose for Ti ndT,$ are based on percentilesf the F-distribution. We observe that onlyone out of the 50 Ti-statistics exceeded the critical values at a! = .05(observation 23) and hat none of them exceeded he critical values forsmaller significance levels. The T2-value for observation 23 exceededthe appropriate critical values ata! = .05 for all the methods. If this wasindeed a process capability study,t is doubtful if this observation wouldhave been labeleds outlier. It s more likely thathe entire sample wouldhave been accepted and used s a reference sample for future testings.The observation 23 would have been considered in thiscase a randomerror (one out of 50, at a! = .OS). From the data generation process, weknow that this was indeed a random error.Grouping the Data.We mentioned that in the process capability stage, he estimationof the internal variability within he subgroups maybe particularly im-portant. The data has of course to be divided at this stage into rationalsubgroups, much the same as it is likely to happen in future stages ofthe production process.Thus, we test each of the k subgroup means j, = 1, . k againstthe overall meanT.The statisticis compared to he UCL of

    F;kn-k-p+l

  • 7/28/2019 Multivariate Theory and Applications

    68/225

    TABLE .1Means from the base sample 49.9026,60.0441)'.The S-matrix from the base sample(50 observations). The data are the base samples:VARI VAR2 T'M T ' M T .I 2

    1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950

    49.85850.0008.6073 1.6559 1B96249.876849.870649.91 1749.847049.888349.915849.915249.905549.896949.913749.858649.951449.698849.889449.940349.913249.854649.881549.831l49.881649.850149.977849.869049.877949.868049.938849.913349.912049.925049.944249.838649.949249.920449.899449.870349.884649.958049.898549.939749.874149.914049.950149.886549.891249.925249.932649.968049.9289

    59.986560.005560.012660.016560.021660.051760.067360.072660.020860.092859.982360.086660.040260.072060.068160.035060.014559.998259.996360.045759,986060.087560.015960.005560.008860.071 160.063460.056060.074960.110059.972560.101460.080360.062560.021960.027160.087860.032960.082660.006160.040160.081060.016960.040660.053260.074160.121960.0709

    4.24881.09045.37674.50090.44190.22050.56292.20461.09514.98612.99721.98840.01225.50161.58481.28662.65542.63155.38031.85792.52777.02520.95961.29571.0088l 22840.37850.10530.73373.84463.81542.41211S3921.52461.061 80.26582.72940.19391.17781.091 70.82062.03300.69970.27660.89880.75134.43140.5856

    4.99241.09686.721 15.36060.431 80.21340.55292.33631.10186.09803.301 82.08560.01 176.92631.63101.30592.87632.84706.72711 g3662.72099.68590.95961.31571.01101.24360.36880.10140.72624.42244.38222.581 91.58061.56471.06670.25772.96710.18741.18951.09830.81552.13680.69150.26830.89630.74435.25810.5758

    4.75361.13796.17525.06570.45490.22590.58132.35701.14265.67553.26022.11560.01246.33801B716l34832.86742.83936.18041.971Q2.72138.39330.9988l35811.05101.28590.38940.10770.76004.26284.22742.59051B2201B0641.10780.27262.95140.19861.23151.13930.85142.16520.72410.28390.93410.77864.97980.604749.92330.0632.3528

  • 7/28/2019 Multivariate Theory and Applications

    69/225

    56 Chapter 4in orcfer to detect outlying subgroups (e.g., Ryan, 1988). This test pro-cedure can also be used whena base sample s being calibrated.f weidentify l subgroups withTi >U CL and determine an assignableausethat justifies their removal from he data set we can recomputef and S,using the remaining - subgroups. The Upper Control Limits (UCL)for a future subgroup of n observations is then set to be:

    -

    UCL = p (k - + l ) ( n - 1)( k - 1 ) n - k + l - p + l F;,(k-l)n-k+l-p+lHowever, even after removing obvious outliers, the testing of each ofthe subgroup means against he overall mean should e performed withcaution since possible remaining outlying observations or trends inhesample, may affect both he overall mean and he empirical covariancematrix. The subsequent tests for the individual subgroups could thusebiased.It is noteworthy that the fact that the data are grouped does notnecessarily imply thathe estimated covariance matrix haso be pooledover all the groups. Specifically, both in thischapter as well as in theother chapters in this book, we focus on two main cases with respectto the grouping issue:either (a) the data are ungrouped, and then inhecomputation of TA we consider the distance of each observation fromY, .e. (Y, Y), nd use the empirical covariance matrixS, or (b) thedata are grouped andfor the j- th group we consider e,- y ) with thepooled covariance matrix ,.However, those are not the only possible methods f analysis. In-deed, evenf the data are grouped in groups of size n , if we believehatall the data obtained during he process capability study riginate fromthe same distribution, the covariance matrixS (with kn - 1 degrees offreedom) is amore efficient estimatorf E han S p with k(n - ) degreesof freedom (e.g. Alt (1982), Wierda (1994)). Some authors thereforesuggest the use of the statistic

    - --

  • 7/28/2019 Multivariate Theory and Applications

    70/225

    Multivariaterocessapabilitytudies 57The distribution of this statistic is similar to that presented fo r the un-grouped case, and we have

    In this case, we can also compute an alternativetatistic which has, underthe null hypothesis, an F-type distribution. The statistic is:

    The definition of F(-,) s more complicated than inhe ungrouped case,There, = S(-t), i.e. the covariance matrix could have been obtaineddirectly fromall but the i-th observation. In the grouped case we haveN

    F(-,)= [(n- )k$ +n(k - )ST-,)]/(kn - 2)where ST-,) is the covariance matrix computed from the k - 1 groupm