measurement uncertainties in science and technology

Measurement Uncertainties in Science and Technology

Michael Grabe

Measurement Uncertainties in Science and Technology

With 47 Figures

^ Sprin ger

Dr. rer.nat Michael Grabe Am Hasselteich 5, 38104 Braunschweig, Germany E-mail: [email protected]

Library of Congress Control Number: 2005921098

ISBN 3-540-20944-1 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law.

Springer is a part of Springer Science-hBusiness Media.

Springeronline, com

© Springer-Verlag Berlin Heidelberg 2005 Printed in The Netherlands

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

Typesetting: Data prepared by the author using a Springer TgX macro package Cover design: design & production GmbH, Heidelberg

Printed on acid-free paper SPIN 10977521 57/3141/AD 5 4 3 2 1 0

To the memory of my parents

in love and gratitude

Preface

In his treatises on the Method of Least Squares C.F. Gauß [1] distinguishedirregular or random errors on the one hand and regular or constant errors onthe other. As is well known, Gauß excluded the latter from his considerationson the ground that the observer should either eliminate their ultimate causesor rid every single measured value of their influence.

Today, these regular or constant errors are called unknown systematicerrors or, more simply, systematic errors. Such errors turned out, however, tobe eliminable neither by appropriately adjusting the experimental set-ups norby any other means. Moreover, considering the present state of measurementtechnique, they are of an order of magnitude comparable to that of randomerrors.

The papers by C. Eisenhart [9] and S. Wagner [10] in particular have en-trusted the high-ranking problem of unknown systematic errors to the metro-logical community. But it was not until the late 1970s, that it took centerstage apparently in the wake of a seminar held at the Physikalisch-TechnischeBundesanstalt in Braunschweig [19]. At that time, two ways of formalizingunknown systematic errors were discussed. One of these suggested includingthem smoothly via a probabilistic artifice into the classical Gaussian calculus,and the other, conversely, proposed generalizing that formalism in order tobetter bring their particular features to bear.

As the author prefers to see it, systematic errors introduce biases, andthis situation would compel the experimenter to differentiate between ex-pectations on the one hand and true values on the other – a distinctionthat does not exist in the conventional error calculus. This perspective andanother reason, which will be explained below, have induced the author topropose a generalization of the classical error calculus concepts. Admittedly,his considerations differ substantially from those recommended by the officialmetrology institutions. Today, the latter call for international validity underthe heading Guide to the Expression of Uncertainty in Measurement [34–36].

Meanwhile, both formal and experimental considerations have raised nu-merous questions: The Guide does not make a distinction between true valuesand expectations; in particular, uncertainty intervals are not required to lo-calize the true values of the measurands. Nonetheless, physical laws in generalas well as interrelations between physical constants in particular are to be

VIII Preface

expressed in terms of true values. Therefore, a metrology which is not aimedat accounting for a system of true values is scarcely conceivable.

As the Guide treats random and systematic errors in like manner on aprobabilistic basis, hypothesis testing and analysis of variance should remainvalid; in least squares adjustments, the minimized sum of squared residualsshould approximate the number of the degrees of freedom of the linear system.However, all these assumptions do not withstand a detailed analysis.

In contrast to this, the alternative error model to be discussed here sug-gests healing answers and procedures appearing apt to overcome the saiddifficulties.

In addition to this, the proposal of the author differs from the recommen-dations of the Guide in another respect, inasmuch as it provides the intro-duction of what may be called “well-defined measurement conditions”. Thismeans that each of the measurands, to be linked within a joint error propa-gation, should be subjected to the same number of repeat measurements. Asobvious as this might seem, the author wishes to boldly point out that justthis procedure would return the error calculus back to the womb of statisticswhich it had left upon its way through the course of time. Well-defined mea-surement conditions allow complete empirical variance–covariance matricesto be assigned to the input data and this, in fact, offers the possibility ofexpressing that part of the overall uncertainty which is due to random errorsby means of Student’s statistics.

Though this idea is inconsistent with the traditional notions of the ex-perimenters which have at all times been referred to incomplete sets of data,the attainable advantages when reformulating the tools of data evaluation interms of the classical laws of statistics appear to be convincing.

There is another point worth mentioning: the approach to always assessthe true values of the measurands in general and the physical constants inparticular eliminates, quasi by itself, the allegedly most fundamental termof metrology, namely the so-called “traceability”. Actually, the gist of thisdefinition implies nothing else but just that what has been stated above,namely the localization of true values. While the Guide cannot guarantee“traceability”, this is the basic property of the alternative error model referredto here.

Last but not least, I would like to express my appreciation of my col-leagues’ experience, which I referred to when preparing the manuscript aswell, as their criticism, which inspired me to clarify the text. For technicalsupport I am grateful to Dr. Michael Weyrauch and to Dipl.-Ubers. HanneloreMewes.

Braunschweig, January 2005 Michael Grabe

Contents

Part I Characterization, Combinationand Propagation of Errors

1 Basic Ideas of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 Sources of Measurement Uncertainty . . . . . . . . . . . . . . . . . . . . . . 31.2 Quotation of Numerical Values and Rounding . . . . . . . . . . . . . . 8

1.2.1 Rounding of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.2 Rounding of Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Formalization of Measuring Processes . . . . . . . . . . . . . . . . . . . . 112.1 Basic Error Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Random Variables, Probability Densities

and Parent Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Elements of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Densities of Normal Parent Distributions . . . . . . . . . . . . . . . . . 253.1 One-Dimensional Normal Density . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Multidimensional Normal Density . . . . . . . . . . . . . . . . . . . . . . . . 293.3 Chi-Square Density and F Density . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Student’s (Gosset’s) Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.5 Fisher’s Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.6 Hotelling’s Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Estimators and Their Expectations . . . . . . . . . . . . . . . . . . . . . . . 514.1 Statistical Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.2 Variances and Covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.3 Elementary Model of Analysis of Variance . . . . . . . . . . . . . . . . . 58

5 Combination of Measurement Errors . . . . . . . . . . . . . . . . . . . . . 615.1 Expectation of the Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . 615.2 Uncertainty of the Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . 635.3 Uncertainty of a Function of One Variable . . . . . . . . . . . . . . . . . 665.4 Systematic Errors of the Measurands . . . . . . . . . . . . . . . . . . . . . . 70

X Contents

6 Propagation of Measurement Errors . . . . . . . . . . . . . . . . . . . . . . 716.1 Taylor Series Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.2 Expectation of the Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . 746.3 Uncertainty of the Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . 776.4 Uncertainties of Elementary Functions . . . . . . . . . . . . . . . . . . . . 806.5 Error Propagation over Several Stages . . . . . . . . . . . . . . . . . . . . . 87

Part II Least Squares Adjustment

7 Least Squares Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937.1 Geometry of Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937.2 Unconstrained Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977.3 Constrained Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8 Consequences of Systematic Errors . . . . . . . . . . . . . . . . . . . . . . . 1018.1 Structure of the Solution Vector . . . . . . . . . . . . . . . . . . . . . . . . . . 1018.2 Minimized Sum of Squared Residuals . . . . . . . . . . . . . . . . . . . . . 1038.3 Gauss–Markoff Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058.4 Choice of Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078.5 Consistency of the Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

9 Uncertainties of Least Squares Estimators . . . . . . . . . . . . . . . . 1139.1 Empirical Variance–Covariance Matrix . . . . . . . . . . . . . . . . . . . . 1139.2 Propagation of Systematic Errors . . . . . . . . . . . . . . . . . . . . . . . . . 1179.3 Overall Uncertainties of the Estimators . . . . . . . . . . . . . . . . . . . . 1199.4 Uncertainty of a Function of Estimators . . . . . . . . . . . . . . . . . . . 1289.5 Uncertainty Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

9.5.1 Confidence Ellipsoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319.5.2 The New Solids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1329.5.3 EP Boundaries and EPC Hulls . . . . . . . . . . . . . . . . . . . . . 141

Part III Special Linear and Linearized Systems

10 Systems with Two Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15110.1 Straight Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

10.1.1 Error-free Abscissas, Erroneous Ordinates . . . . . . . . . . . 15410.1.2 Erroneous Abscissas, Erroneous Ordinates . . . . . . . . . . . 165

10.2 Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17210.3 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Contents XI

11 Systems with Three Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 17511.1 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17511.2 Parabolas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

11.2.1 Error-free Abscissas, Erroneous Ordinates . . . . . . . . . . . 18111.2.2 Erroneous Abscissas, Erroneous Ordinates . . . . . . . . . . . 188

11.3 Piecewise Smooth Approximation. . . . . . . . . . . . . . . . . . . . . . . . . 19211.4 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

12 Special Metrology Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20312.1 Dissemination of Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

12.1.1 Realization of Working Standards . . . . . . . . . . . . . . . . . . 20312.1.2 Key Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

12.2 Mass Decades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21012.3 Pairwise Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21512.4 Fundamental Constants of Physics . . . . . . . . . . . . . . . . . . . . . . . . 21512.5 Trigonometric Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

12.5.1 Unknown Model Function . . . . . . . . . . . . . . . . . . . . . . . . . 22512.5.2 Known Model Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

Part IV Appendices

A On the Rank of Some Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

B Variance–Covariance Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

C Linear Functions of Normal Variables . . . . . . . . . . . . . . . . . . . . . 239

D Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

E Least Squares Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

F Expansion of Solution Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

G Student’s Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

H Quantiles of Hotelling’s Density . . . . . . . . . . . . . . . . . . . . . . . . . . 261

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

Part I

Characterization, Combination

and Propagation of Errors

1 Basic Ideas of Measurement

1.1 Sources of Measurement Uncertainty

To measure is to compare. To this end, as we know, a well defined systemof physical units is needed. As to the latter, we have to distinguish betweenthe definitions and the realizations of units. By its very nature, a definedunit is a faultless, theoretical quantity. In contrast, a realized unit has beenproduced by a measuring device and is therefore affected by imperfections.Moreover, further errors occur when a realized unit is used in practice, as thecomparison of a physical quantity to be measured with the appropriate unitis a source of measurement errors.

Let us assign a true value to every quantity to be measured or a mea-surand. True values will be indexed by a 0 and measured values by naturalnumbers 1, 2, . . ..1 If x designates the physical quantity considered, then thesymbols x0, x1, x2, . . . denote the true value and the measured values.

To reduce the influence of measurement errors, it suggests to repeat themeasurement, say n times, under the same conditions and to subsequentlycondense the set of repeat measurements thus gained to an estimator for theunknown (and unknowable) true value. Spontaneously one might wish n to beas large as possible hoping to achieve high measurement accuracy. However,there are limits to this wishful thinking.

A large number of repeat measurements would be meaningful only if theresolution of the measuring device is sufficiently high. If the resolution islow, it should hardly make sense to pick up a huge number of repeated mea-surements, as a coarse device will respond to one and the same quantity,repeatedly input, with a rather limited number of different output values.

To be more specific, the accuracy of measuring instruments suffers fromrandom errors, which we are all familiar with, as well as from so-called un-known systematic errors, the knowledge of which appeared to be confined, atleast until the late 1970s, to, say, esoteric circles. Unknown systematic errorsremain constant in time and unknown with respect to magnitude and sign.Traditionally, they are not considered on the part of the classical Gaussianerror calculus. As there is no chance of ignoring or eliminating such per-

1As a zeroth measurement does not exist, the index 0 in x0 clearly distinguishesthe true value from the measured data x1, x2, . . ..

4 1 Basic Ideas of Measurement

turbations, experimenters rather have to lock them within intervals of fixedextensions, the actual lengths of which have to be taken from the knowledgeof the measuring devices and the conditions under which they operate.

Given that the unknown systematic errors as well as the statistical proper-ties of the random process, which produces the random measurement errors,remain constant in time, the experimental set-up corresponds, put in sta-tistical terms, to a stationary process. For the experimenter, this would bethe best he can hope for, i.e. this would be, by definition, a perfect mea-suring device. Over longer periods, however, measuring devices may changetheir properties, in particular, unknown systematic errors may be subject todrifts.

In case unknown systematic errors changed following some trend, themeasurements constitute something of the kind of a time series which wouldhave to be treated and analyzed according to different methods which willbe of no concern here.

As drifts might always occur, the experimenter should try to control themeasurement conditions so that the unknown systematic errors remain con-stant at least during the time it takes to pick up a series of repeat mea-surements, say, x1, x2, . . . , xn, and to specify the extensions of the intervalsconfining the unknown systematic errors such that their actual values arealways embraced.

After all, smaller samples of measured values may appear more meaning-ful than larger ones. In particular, averaging arbitrarily many repeat mea-surements, affected by time-constant unknown systematic errors, would notreduce the latter.

The true value of the physical quantity in question is estimated by meansof a suitably constructed estimator. The experimenter should declare howmuch the two quantities, the true value and the estimator, might deviatefrom each other. The quantity accounting for the difference is called themeasurement uncertainty. This basic quantity grants us access to the possiblelocations of the true value of the measurand relative to its estimator, Fig. 1.1.

As we do not know whether the estimator is larger or smaller than the truevalue – if we knew this, we would have disregarded executable corrections –we shall have to resort to symmetrical uncertainty intervals. Due to formalreasons, the numerical values of uncertainties are always positive quantitieswhich we afterwards have to provide with a ± sign. We state:

The interval

estimator ± measurement uncertainty

defines the result of a measurement. This range is required to localize the truevalue of the measurand (or quantity to be measured).

As a rule, measurement uncertainties will be composed of two parts, oneis due to random errors and the other is due to unknown systematic errors.

In this context, the terms accuracy and reproducibility turn out to becrucial, see also Sect. 4.1. The accuracy refers to the localization of the true

1.1 Sources of Measurement Uncertainty 5

Fig. 1.1. The interval (estimator ± measurement uncertainty) is required to local-ize the true value of the measurand

value of the measurand: The smaller the uncertainty the higher the accuracy.In contrast to this, reproducibility merely aims at the statistical scattering ofthe measured data: The closer the scattering the higher the reproducibility.

Obviously, it is neither necessary nor advisable to always strive for highestattainable accuracies. But, in any case, it is indispensable to know the actualaccuracy of the measurement and, in particular, that the latter reliably local-izes the true value. The national metrology institutes ensure the realizationof units by means of hierarchies of standards. The respective uncertaintiesare clearly and reliably documented so that operators can count on them atany time and at any place.

In the natural sciences, a key role comes to measurement uncertaintiesas principles of nature are verified or rejected by comparing theoretical pre-dictions with experimental results. Here, measurement uncertainties have todecide whether new ideas should be accepted or discarded.

In engineering sciences measurement uncertainties provide an almost un-limited diversity of intertwined dependencies which, in the end, are crucial forthe functioning, quality, service life and competitiveness of industrial prod-ucts.

Trade and industry categorically call for mutual fairness: both, the buyerand the seller expect the measuring processes to so subdivide merchandiseand goods that neither side will be overcharged.2

Figure 1.2 sketches the basics of a measuring process. The constructionof a measuring device will be possible not until the physical model has founda mathematical formulation. After this, procedures for the evaluation of the

2One might wish to multiply the relative uncertainties (defined in (1.2) below)of the counters for gas, oil, petrol and electricity by their annual throughput andthe respective prices.


Fig. 1.2. Diagram of a measuring process

measurements have to be devised. As a rule, the latter require additionalassumptions, which are either independent of the measuring process, suchas the method of least squares, or depend on it such as, for example, thestatistical distribution of random errors and the time-constancy of unknownsystematic errors. The monograph at hand confines itself to estimators whosestructures are that of unweighted or weighted arithmetic means. Whetheror not the procedures estimating their measurement uncertainties may betransferred to estimators satisfying other criteria will not be discussed here.

Let any given arithmetic mean x and its uncertainty ux > 0 have, afterall, been estimated. Then, the mathematical notation will be

x ± ux . (1.1)

Example

Let

m ± um = (10.31± 0.05) g

be the result of a weighing. Then the true value m0 of the mass m is requiredto lie somewhere within the interval 10.26 g,. . . ,10.36 g.

The uncertainty ux may be used to derive other, occasionally useful, state-ments. In this sense, the quotient

ux

x(1.2)

1.1 Sources of Measurement Uncertainty 7

denotes the relative uncertainty. Herein, we might wish to substitute x0 forx. This, however, is prohibitive, as the true value is not known. Relativeuncertainties vividly illustrate the accuracy of measurement, independent ofthe order of magnitude of the respective estimators. Let U1 and U2 designateany physical units and let

x1 = 10−6 U1 , x2 = 106 U2

be estimators with the uncertainties

ux1 = 10−8 U1 , ux2 = 104 U2 .

The two estimators have, obviously, been determined with equal accuracy,though their uncertainties differ by not less than 12 decimal powers.

There are uncertainty statements which are still more specific. The fre-quently used ppm uncertainty (ppm means “parts per million”) makes refer-ence to the decimal power of 6. The quotient

ux

x=

uppm(x)106

thus leads touppm(x) =

ux

x106 . (1.3)

The idea is to achieve a more comfortable notation of numbers by getting ridof decimal powers which are known a priori. As an example, let uppm = 1;this implies ux/x = 10−6.

Example

The frequency ν of the inverse ac Josephson effect involves dc voltage stepsU = (h/2e)ν; e designates the elementary charge and h the Planck constant.Let the quotient 2e/h be quantified as

4.8359767(14) 1014 Hz V−1 .

How is this statement to be understood? It is an abbreviation and means

2e/h± u2e/h

= (4.8359767± 0.0000014)1014 Hz V−1 .

Converting to ppm yields

uppm =0.0000014 · 1014 Hz V−1

4.8359767 · 1014 Hz V−1· 106 = 0.289 . . .

or, rounding up,

uppm(2e/h) = 0.30 .


1.2 Quotation of Numerical Values and Rounding

Evaluation procedures nearly always endow estimators with dispensable, in-significant decimal digits. To get rid of them, the measurement uncertaintyshould tell us where and how we have to round and, in particular, how toquote the final result [37].

1.2.1 Rounding of Estimators

First of all, we determine the decimal place in which the estimator shouldbe rounded. Given that the first decimal of the measurement uncertaintydiffering from 0 turns out to be one of the digits

1 or 2

3 up to 9

⎫⎪⎬⎪⎭ the estimator should be rounded

⎧⎪⎨⎪⎩

in the place to the rightof this place

in just this place

Having determined the decimal place of the estimator in which it should getrounded, we round it

down,

up,

if the decimal place to the right

of this place is one of the digits

0 , up to 4

5 , up to 9

1.2.2 Rounding of Uncertainties

The uncertainty should be rounded up in the decimal place in which theestimator has been rounded.

Sometimes the numerical value of a physical constant is fixed with regardto some necessity. It goes without saying that fixing a physical constant doesnot produce a more accurate quantity. A possibility to emphasize a fixed digitwould be to print it boldly, e.g. 273.16K.

Table 1.1. Rounding of estimators and uncertainties

raw results rounded results

5.755018 . . . ± 0.000194 . . . 5.75502 ± 0.00020

1.9134 . . . ± 0.0048 . . . 1.913 ± 0.005

119748.8 . . . ± 123.7 . . . 119750 ± 130

81191.21 . . . ± 51.7 . . . 81190 ± 60

1.2 Quotation of Numerical Values and Rounding 9

Examples

Let

(7.238143± 0.000185) g

be the result of a weighing. The notation

7 . 2 3 8 1 4 3↑

0 . 0 0 0 1 8 5

facilitates the rounding procedure. The arrow indicates the decimal place inwhich the estimator should be rounded down. In just this decimal place theuncertainty should be rounded up, i.e.

(7.23814± 0.00019) g

For simplicity, in the examples presented in Table 1.1 the units will be sup-pressed.

2 Formalization of Measuring Processes

2.1 Basic Error Equation

Measuring instruments are known to not operate perfectly. Nevertheless, forthe experimenter it suffices to register a sequence of repeat measurements of,say, n outcomes

x1, x2, . . . , xn , (2.1)

which are free from drift. Obviously, under these circumstances, the datawill scatter around a horizontal line, Fig. 2.1. As Eisenhart [9] has vividlyformulated, the measuring instrument then operates in a state of statisticalcontrol. Then, as we shall assume, the measurement errors may be subdividedin two classes, namely in

– random errors and– unknown systematic errors,

the latter being constant in time, at least during the period it takes to carryout the measurements.

The reservation “during the period of measurement” is essential as itmay well happen that when repeating the measurements at a later time, thesystematic errors may assume other values, giving rise to a different stationarystate. To that effect, the boundaries of the unknown systematic errors shouldbe fixed such that they encompass, once and for all, all potential realizationsof the systematic errors.

Any drift, however, would model the data according to some trend. Themeasuring process would then no longer be stationary and instead of a “hori-zontally” scattering series of repeated measurements, the experimenter wouldhave registered something similar to a time series.

There will certainly be no drifts, if the so-called influence quantities of themeasuring process remain constant in time. Those latter quantities are relatedto environmental and boundary conditions, to the thermal and mechanicalstability of the experimental set-up, to the stability of its detectors, its powersupplies, etc.

Generally speaking, a non-drifting measuring device will not be free fromsystematic errors as the absence of drifts rather merely means that the in-fluences of systematic perturbations are invisible – just on account of theirconstancy.

12 2 Formalization of Measuring Processes

Fig. 2.1. Non-drifting (above) and drifting (below) sequences of repeated measure-ments

While random errors occur instantaneously, in the course of the measure-ments, the actual values of systematic errors are predetermined before themeasurements get started.

In theoretical terms, systematic errors arise when physical models areconceptualized as, in general, this calls for approximations and hypotheses.At an experimental level, systematic errors may be due to inexact adjust-ments and pre-settings, to environmental and boundary conditions, whichare known to be controllable only with finite accuracy, etc. As an example,let us consider the density of air. Though its values appear to be predictablewith remarkable accuracy, there still remains a non-negligible uncertaintycomponent which is due to unknown systematic errors. In weighings of highaccuracy, the greater the differences between the densities of the masses tobe compared, (e.g. platinum-iridium 21 500kg/m3 and steel 8 000 kg/m3), thegreater the uncertainty of the correction due to air buoyancy. As operatorscomplain, this latter uncertainty, unfortunately, makes a significant contri-bution to the overall uncertainty of the weighing [26].

Furthermore, the switching-on of an electrical measuring device will notalways lead to one and the same stationary state, and the same applies to theuse of a mechanical device. Last but not least, inherently different proceduresfor measuring one and the same physical quantity will certainly bring aboutdifferent systematic errors [14].

Random errors have to be attributed to microscopic fluctuations of eventswhich affect the measuring process and prove to be uncontrollable on a macro-scopic scale. According to experience, the random errors of stationary exper-

2.1 Basic Error Equation 13

imental set-ups may be considered, at least in reasonable approximation, tofollow a normal probability distribution. Though we are not in a positionto understand these phenomena in detail, we find guidance in the centrallimit theorem of statistics [43]. According to this, loosely stated, the super-position of sufficiently many similar random perturbations tends to a normaldistribution.

Actually, for practical reasons, the statistical properties of measured dataare rarely scrutinized. Nevertheless, we have to declare: should the assump-tion of normally distributed data fail, the procedures for the estimation ofthe random part of the overall measurement uncertainties, outlined in thefollowing step by step, will not apply.

The status of unknown systematic errors is quite different. First of all, thequestion as to whether or not they should be treated probabilistically is stillcontroversial [10,19]. Though the measuring process as outlined above relatesthem to a probabilistic origin, this is not meant in the sense of repeatableevents but with respect to the observation that the experimenter has nochance of controlling or influencing the chain of events which, in the end,leads to an operative systematic error. Again, while it is true that systematicerrors have their roots in random events, on account of their constancy, andin order to arrive at reliable measurement uncertainties, we prefer to treatthem non-probabilistically and make allowance for them by introducing biasesand worst-case estimations. We might say that the approach of the Guide torandomize unknown systematic errors corresponds to an a priori attitude,whereas the idea to classify them as biases reflects an a posteriori position.While the former roots in theoretical considerations, the latter starts fromexperimental facts.

Should there be just one systematic error, it should certainly not be dif-ficult to draw indisputable conclusions. Yet, it will be more controversial toassess the influence of several or even many unknown systematic errors. Weshall discuss this situation in Sects. 2.3 and 6.3.

As is common practice, we let f denote the unknown systematic error andlet

f1 ≤ f ≤ f2 ; f = const.

specify the confining interval. We expect the boundaries f1 and f2 to besafe, which simply means that f shall not lie outside the limits. As banal asthis might sound, just as little should we forget that a declaration of intentis one matter and its implementation another. To recognize the sources ofsystematic errors, to formalize their potential influence, to experimentallyreduce their magnitude as far as possible, all this requires many years ofexperience which, indeed, constitutes the ultimate art of measurement.

Nevertheless, the mathematical procedures providing the boundaries forthe confining error intervals are not necessarily complicated. All sources whichare to be considered should be listed and investigated with respect to the


kind of influence they exert. Some of them will act directly, others withinfunctional relationships. Then, the overall effect to be anticipated should beestimated on the basis of a worst-case reckoning.

In the following we will show that it is always feasible to confine f throughan interval symmetrical to zero: if there are experimental findings definitelyspeaking against such an interval, we should use this information and subtractthe term

(f1 + f2)2

from both the asymmetrical boundaries and the measured data. Thus, wearrive at an interval of the form

−fs ≤ f ≤ fs ; fs =(f2 − f1)

2, (2.2)

and a new set of data x′l = xl − (f1 + f2)/2, l = 1, . . . , n, where, for conve-

nience, we should abstain from explicitly priming the data. Rather, we shouldtacitly assume that the corrections have already been made.

Thus, our error model assumes normally distributed random errors andunknown systematic errors of property (2.2). The latter are understood tobias the measuring process. We observe:

The basic error equation of measurement for non-drifting measuring de-vices (operating in a state of statistical control) is given by

xl = x0 + εl + f ; l = 1, . . . , n − fs ≤ f ≤ fs . (2.3)

While the random errors εl are assumed to be normally distributed, the un-known systematic error f is considered a time-constant perturbation.

There may be situations in which the effects of perturbations are knownto one group of persons but not to another. Then, the first group might wishto speak of systematic deviations while the second group has no other choicethan to continue to refer to systematic errors. Should such a distinction turnout to be relevant, it is proposed agreeing on the definition that

– a systematic deviation denotes a time-constant quantity identified in mag-nitude and sign, while

– an unknown systematic error designates a time-constant quantity uniden-tified in magnitude and sign.

In the following, we shall exclusively address the latter, i.e. unknown system-atic errors, and to be brief, we shall simply call them systematic errors, asby their very nature, errors are always unknown.

Finally, Fig. 2.2 particularizes the measuring process depicted in Sect. 1.1.It may happen that the mathematical representation of the physical modelunder consideration, for numerical or physical reasons, calls for approxima-tions. Even hypotheses may enter, e.g. when a physical model is to be tested


Fig. 2.2. Diagrammed measuring process

by experiment. The design of the measuring device complies with the math-ematical formulation of the physical model.

Let us now discuss the still controversial question of whether or not sys-tematic errors should be randomize by means of postulated probability den-


sities. We are sure the experimenter has no chance of influencing the actualvalues of systematic errors entering his measurements. Doubtless, in that heis confronted with fortuitous events. On the other hand, the decisive questionseems to be whether or not he is in a position to alter those systematic errors.In general, the answer will be negative.

Disregarding the efforts to be made, the experimenter could think of dis-assembling and reassembling the measuring device many a time hoping tovary its systematic errors. Yet, he might not be in a position to do that ei-ther for fear of damaging or even destroying some of the components or onaccount of principle reasons.

Furthermore, the experimenter could try to vary the environmental con-ditions. This, however, might be difficult to achieve, and if he could do this,on account of the complexity of the interactions the new state might offer nomore information than the previous one.

After all, even if the experimenter had the intention to vary his systematicerrors, the prospects of success will be limited so that a complete repetitionof the measurements under “new conditions” need not necessarily be helpful.

Instead, we ask what properties a formalism, which abstains from assign-ing probability densities to systematic errors, should have. From the outsetit is clear that such an approach will be incompatible with the essential fea-tures of the classical Gaussian formalism, thus making it necessary to startthe revision from scratch. This might be seen a disadvantage. On the otherhand, the time-constancy of unknown systematic errors doubtlessly mani-fests physical reality, while feeding randomized unknown systematic errorsinto the classical Gaussian formalism produces quite a series of formal aswell as experimental contradictions.

In statistical terms, estimators affected by unknown systematic errors arebiased. At the same time, postulated densities would formally cancel biasesthus contradicting reality.

As has been mentioned the treatment of several random variables asks forso-called covariances. The nature of the latter is either theoretical or empiri-cal, depending on whether the covariances are taken from parent distributionor from samples of finite size. Obviously, randomized unknown systematic er-rors only possess theoretical expectations while physical experiments, on theother hand, will never produce expectations but just estimators. Now, even ifstationary measuring processes will certainly not exhibit couplings betweenrandom and unknown systematic errors, as long as the latter are treatedprobabilistically, on account of purely formal reasons we are entitled to askfor covariances between the randomized and the truly random errors. – Up tonow, a convincing answer to the question of how this might be accomplisheddoes not seem to be at hand.

The analysis of variance attempts to find out whether or not differentsets of empirical data of varying origins relating to one and the same phys-ical quantity might have suffered from systematic effects. However, as we


presently know, empirical data sequences are almost surely charged with sys-tematic errors so that any effort to refer to classical test criteria would besuperfluous from the outset. In particular, there seems to be no chance to tryto generalize the analysis of variance to include unknown systematic errorsby means of postulated densities, Sect. 4.3.

In least squares adjustments, the minimized (and appropriately weighted)sum of squared residuals should approximate the number of degrees of free-dom of the linear system. However, as empirical observations repeatedly show,this relationship scarcely applies. Rather, as a rule, the aforesaid sum exceedsthe number of degrees of freedom appreciably and it suggests attributing thisdiscrepancy to randomized systematic errors, i.e. though the latter have beenrandomized by postulated densities, the experiment invariably keeps themconstant thus revealing the untenability of the randomization concept.

Disregarding all conflicting points, i.e. when systematic errors are ran-domized, one could, nevertheless, set about defining uncertainty intervals.However, the associated formalism necessarily brings about the geometricalcombination of random and systematic errors and thus bans clear and concisedecisions. In particular, the experimenter would stay in doubt whether or nothis uncertainty localized the true value of his measurand. This, unfortunately,implies the failure of traceability within the system of physical quantities andso disentitles metrology of its objectivity.

Ostensibly, postulated densities for systematic errors would save the Gaus-sian error calculus. At the same time, this procedure abolishes the consistencybetween mathematics and physics.

The only way out seems to be to revise the Gaussian formalism fromscratch. That Gauß1 himself strictly refused to include systematic errors inhis considerations should be understood as a clear warning sign – otherwise,as he knew, his formalism would not have been operative.

Indeed, the situation becomes different when systematic errors enter in theform of time-constant perturbations. Then, as explained above, random andsystematic errors remain strictly separated and combine arithmetically. Also,when properly done, this approach allows the influence of random errors to betreated by means of Student’s statistics and the influence of systematic errorsby worst case estimations. The associated formalism is clear and concise. Inparticular, so-defined measurement uncertainties localize the true values ofthe measurands “quasi-safely” thus ensuring traceability from the start.

However, the announced reference to Student’s statistics calls for aparadigm shift inasmuch as we shall have to introduce what might be calledwell-defined measuring conditions. These ask for equal numbers of repeatedmeasurements for each of the variables entering the error propagation. Forapparent reasons, well-defined measuring conditions allow the complete em-

1Though this is the proper spelling, in what follows, we shall defer to the morecommon notation Gauss.


pirical variance–covariance matrices of the input data to be established and,as will become visible, just these matrices lead us back to Student’s statistics.

Nonetheless, to require equal numbers of repeat measurements conflictswith common practice. Possibly, unequal numbers of repeat measurementsreflect the historical state of communication which, in the past, experimentersand working groups had to cope with. In other words, they virtually had noother choice. However, in the age of desk computers and the Internet, workinggroups can easily agree on common numbers of repeat measurements, and,by the same token, they can exchange their raw data.

In the end it seems to be more advantageous to forego excessive numbersthan to apply questionable, at least clumsy procedures of data evaluation.Firstly, any physical result may not significantly depend on the numbers ofrepeat measurements and, secondly, questionable uncertainties will tend toblur the primary goal of safely localizing the true values of the measurands.

In view of the high costs Big Physics involves, we should not hesitate tocall for well-defined measuring conditions, all the more the experimental ef-forts remain limited while the attainable benefits promise to be considerable.

2.2 Random Variables, Probability Densitiesand Parent Distributions

It seems appropriate to formalize the functioning of measuring devices bymeans of random variables. From a pragmatic point of view, on the one handwe have

– the set of all measured values which the measuring instruments can pro-duce, and on the other hand

– the mathematically abstract definition of random processes and randomvariables which imply that any number generated by a random processmay be understood as the realization of a random variable.

Matching the two means assuming that any measured value is a realizationof a random variable and any realization of a random variable a measuredvalue.

Let X denote a random variable and let us attempt to identify its realiza-tions x1, x2, . . . , xn with the measured data. Conceptually, we might expectthe experimental set-up to produce an infinite number of different values.However, as the instrument’s resolution is finite, we must be content witha random variable the realizations of which are discrete numbers with rela-tively few decimal places. In particular, we will in no way encounter an infinitenumber of numerically different readings. Nevertheless, for simplicity and ina formal sense, we shall stick to the notion of continuous random variablesassuming that their realizations map the measuring process.

The error model proposed requires that there be but one systematic errorfx superposed on each of the realizations xl; l = 1, . . . , n. As a result, the

2.2 Random Variables, Probability Densities and Parent Distributions 19

Fig. 2.3. One-dimensional empirical distribution density

actual (unknown) value of the systematic error defines the “center of gravity”of the statistical scattering of the random errors.

When formalizing random errors by means of distribution densities, weshall have to distinguish between theoretical and empirical densities. Fromthe features of empirical densities we may infer the structure of the under-lying theoretical densities. To find the empirical distribution density of asequence of measured data, we assign the data to the horizontal axis andthe frequencies of their occurrences to the vertical axis of a rectangular co-ordinate system. Then, we decompose the horizontal axis into an adequatenumber r of intervals or “classes” of width ∆xi; i = 1 . . . , r. For any given i,we count the numbers ∆ni of measured values falling within the respectiveintervals

xi, . . . , xi + ∆xi ; i = 1, . . . , r .

The empirical distribution density is defined by

p(xi) =∆ni

n∆xi; i = 1, . . . , r . (2.4)

Here, n denotes the total number of measured data. Obviously, r should havea value which is neither too large nor too small. In the first case, the numbersp(xi); i = 1, . . . , r would fluctuate unreasonably strongly, and in the second,they would bring about too coarse a resolution. The p(xi); i = 1, . . . , r definea sequence of rectangular columns, Fig. 2.3. The quotient ∆ni/n is calledthe frequency ratio. After all, the shape of an empirical probability densitydepends not only on the properties of the underlying random process butalso on the width of the intervals ∆xi and the number n of data entering thesorting procedure.

From a statistical point of view, the set of measured values xl; l = 1, . . . , ndefines a sample of size n. Increasing n to any size suggests postulating aso-called parent distribution, encompassing an infinite number of data. Cer-tainly, from an experimental point of view, this is a fiction. Nevertheless, a


parent distribution proves a useful abstraction. In particular, it allows for aprobability density, called theoretical or exact, the shape of which is unique.

The theoretical density of the random variable X will be denoted as pX(x);the symbols x and X express that x is considered to be any realization of therandom variable X .

In general, metrology addresses several series of measurements, e.g.

x1, x2, . . . , xn ; y1, y2, . . . , yn ; z1, z2, . . . , zn (2.5)

stemming from different measuring devices. To each of the sequences we as-sign a random variable, say X, Y, Z, . . .. Clearly, their joint distribution den-sity is multidimensional. Given, there are two random variables, the impliedtwo-dimensional density is constructed as follows: We cover the x, y-planewith a line screen, exhibiting increments ∆xi and ∆yj . Then, we count thenumbers ∆nij of pairs xi, yj falling within rectangles

xi , . . . , xi + ∆xi ; yj , . . . , yj + ∆yj ; i, j = 1 , . . . , r .

The numbers ∆nij provide the two-dimensional empirical density via

p(xi, yj) =∆nij

n∆xi∆yj; i, j = 1 , . . . , r . (2.6)

Figure 2.4 depicts the basic idea. So-called covariances will assist us to allowfor dependencies between the series of the data (2.5); they will be discussedin a later section.

Let us revert to (2.4). Another term for frequency ratio, ∆ni/n, is empir-ical probability. Obviously

∆Pi =∆ni

n= p(xi)∆xi (2.7)

denotes the empirical probability of finding ∆ni out of n measured values inan interval ranging from xi to xi + ∆xi. Accordingly, statement (2.6) yields

∆Pij =∆nij

n= p(xi, yj)∆xi∆yj , (2.8)

denoting the empirical probability of finding ∆nij measured values out of nin a rectangle xi, . . . , xi + ∆xi; yj , . . . , yj + ∆yj. We confine ourselves to thisrather technical definition of probability, which relies on the repeatability ofphysical experiments.

For larger intervals and rectangles, (2.7) and (2.8) turn into sums and dou-ble sums respectively. If, instead, we refer to theoretical or exact distributiondensities and integral formulations

Pa < X ≤ b =

b∫a

pX(x)dx (2.9)

2.3 Elements of Evaluation 21

Fig. 2.4. Two-dimensional empirical density

denotes the probability of finding any realization of the random variable Xwithin an interval a, . . . , b and

Pa < X ≤ b , c < X ≤ d =

b∫a

d∫c

pXY (x, y)dxdy (2.10)

the probability of finding any realizations of the random variables X, Y withina rectangle a, . . . , b; c, . . . , d. Chapter 3 summarizes the distribution densitieswhich will be of relevance here. Subject to their structure, they refer eitherto one or to more than one normal parent distribution.

As to the interplay between random errors and unknown systematic er-rors, for stationary measuring processes we shall assume throughout thatthere are no dependencies or couplings of any kind.

2.3 Elements of Evaluation

In order to find suitable estimators, measured data xl; l = 1, . . . , n are usu-ally subjected to arithmetic averaging. In view of the influence of unknownsystematic errors, the arithmetic mean

x =1n

n∑l=1

xl (2.11)

can, however, only be a biased estimator of the true value x0 of the mea-surand. As has repeatedly been stated, even if we know that x carries asystematic error, we are not in a position to appropriately correct x. Indeed,all we can do is to design a meaningful measurement uncertainty. Accordingto (2.3), each measured value xl may formally be written as


xl = x0 + εl + f ; l = 1, . . . , n . (2.12)

Insertion into (2.11) yields

x = x0 +1n

n∑l=1

εl + f . (2.13)

Even if we assume an arbitrary number of repeat measurements and con-sider the sum over the random errors to become negligibly small, the arith-metic mean still differs by the actual (unknown) value of the systematic errorf from the true value x0 of the measurand. Hence, x is biased with respectto the true value x0.

Averaging notionally over the infinitely many data xl; l = 1, 2, . . . of a nor-mal parent distribution, loosely formulated, the arithmetic mean x changesinto a parameter

µ = x0 + f , (2.14)

where the symmetry of the implied distribution cancels the sum over therandom errors. The parameter µ has the property of what will be calledan expected value. To be realistic, so-defined quantities remain fictitious,experimentally inaccessible parameters. Inserting (2.14) in (2.13) yields

x = µ +1n

n∑l=1

εl , (2.15)

which reveals that though the average x is biased with respect to the truevalue x0 of the measurand, it is unbiased with respect to its expectation µ.

Remarkably enough, C. Eisenhart [9] clarified these ostensibly plain rela-tionships in the early 1960s. To the author’s knowledge, earlier attempts toclear up the foundations of the Gaussian error calculus have not been made.

To better visualize the relationship between the true value x0 of the mea-surand, the expected value µ of the arithmetic mean x and the systematicerror f we introduce the empirical variance

s2 =1

n − 1

n∑l=1

(xl − x)2 , (2.16)

which, obviously, measures the scattering of the random errors, Fig. 2.5.From (2.12) and (2.13) we take that the difference xl− x has cancelled the

systematic error f . In (2.16) we could have inserted a factor 1/n instead of1/(n−1). However, a so-defined empirical variance would be biased, Sect. 4.2.Let σ2 denote the expected value of the empirical variance s2, and, further,s the empirical and σ the theoretical standard deviation. Again, σ2 is anabstract definition remaining experimentally inaccessible. Nevertheless, in aformal sense, the theoretical variance and the theoretical standard deviationwill prove useful many a time.

2.3 Elements of Evaluation 23

Fig. 2.5. True value x0, expected value µ, arithmetic mean x, systematic error fand empirical standard deviation s

The uncertainty of the arithmetic mean should account for the randomfluctuations of the xl as well as for the systematic error f , the latter beingbounded by an interval of the form (2.2). As we ask for safe uncertainties, weshall bring the boundaries ±fs of that interval to bear. This, in fact, meansnothing other than to resort to a worst case estimation.

Equation (2.12) suggests to combine random and systematic errors arith-metically. Hence u = s + fs appears to be a first reasonable assessment ofthe uncertainty of the arithmetic mean. For completeness, let us provide theuncertainty, empirical standard deviation and the worst case estimate of theunknown systematic error with indices, then

ux = sx + fs,x . (2.17)

Later, we shall refine the momentousness of random errors in modifying thefirst term sx by a factor based on sophisticated statistical considerations. Inmost cases there will be several series of measurements, so that the meansx, y, . . . have to be inserted into a given function φ(x, y, . . .) . Thus, we shallhave to assess the uncertainty uφ of a given estimator φ(x, y, . . .) and to quotesomething like

φ(x, y, . . .) ± uφ . (2.18)

Fortunately, the considerations presented hitherto may be generalized so that,in correspondence to (2.17), we shall arrive at a similar result, namely

uφ = sφ + fs,φ . (2.19)


Fig. 2.6. Error propagation, symbolic

The quantities sφ and fs,φ will be developed in Chap. 6. Also, the first termon the right-hand side may be modified by means of more sophisticated sta-tistical concepts.

In respect of Fig. 2.6, it seems worth adding a remark about the mathe-matical operations underlying (2.19). The procedures in question will combinen statistically equivalent data sets

x1, y1, . . .x2, y2, . . .. . . , . . . , . . .xn, yn, . . .

where we shall have to brood over the question of how to pair or group thedata.

In contrast to the fluctuations of the random errors, the respective sys-tematic errors of the sequences will, of course, not alter, i.e. new informationcomes in on the part of statistical errors, but not on the part of systematicerrors – all we know is that they have assumed (unknown) fixed values withinintervals of given lengths.

Apparently, the course of time had shrouded Gauss regular or constantsystematic errors with the veil of oblivion. To the author’s knowledge, theywere rediscovered or possibly even newly discovered by Eisenhart [9] andWagner [10] inside the national metrology institutes.

Remarkably enough, in certain cases, legal metrology exclusively confinesitself to systems of inequalities as given in (2.2), treating and propagatingunknown systematic errors in the form of consecutive worst case estimations,thus meeting basic requirements to operate efficiently and safely [26].

3 Densities of Normal Parent Distributions

3.1 One-Dimensional Normal Density

Let the realizations of a random variable X constitute what has been calleda parent distribution. If the probability of a realization of X falling withinan interval x, . . . , x + dx is given by

pX(x)dx =1

σx

√2π

exp(−x − µx)2

2σ2x

)dx , (3.1)

the pool of data is said to be normally distributed and pX(x) itself is calledthe normal distribution density. While the parameter µx marks the center ofthe density, the parameter σx quantifies the broadness of the scattering ofthe data. To abbreviate, we also say X is N(µx, σx)-distributed.

Figure 3.1 depicts the progression of the density. Following the centrallimit theorem of statistics, we shall consider the stochastic perturbationsof measuring processes to be at least approximately normally distributed –though, of course, there will be exceptions. But even if the statistical behaviorof the perturbations is known on that score, metrological applications of (3.1)confront the experimenter with the difficulty, that he is not familiar with theparameters µx and σx. Though this might be a matter of fact, the authorwishes to point to the conflict between theoretically defined parameters ofa distribution density and the experimenter’s chances of finding appropriate

Fig. 3.1. Normal probability density

26 3 Densities of Normal Parent Distributions

approximations for them. Strictly speaking, by their very nature, theoreticallydefined parameters prove to be experimentally inaccessible.

Just to get accustomed to the practical aspect of probability densities,let us try to approximate the numbers ∆nj of the empirical density (2.4) bymeans of the density pX(x) implied in (3.1). Assuming normally distributedmeasured data xl; l = 1, . . . , n, the number of values falling within any of ther intervals xj , . . . , xj + ∆xj ; j = 1, . . . , r would be

∆nj ≈ n

σx

√2π

exp(− (xj − µx)2

2σ2x

)∆xj . (3.2)

Using (3.2), we might wish to design a mean of the data,

x ≈r∑

j=1

xj∆nj

n(3.3)

and also something like a quadratic spread with respect to the so-definedmean,

s2x ≈

r∑j=1

(xj − x)2∆nj

n. (3.4)

However intriguing these attempts may seem, they are meaningless, as theyare based upon the unknown parameters µx and σx.

Let us now refer to the parent distribution and not just to a sample ofsize n. Then, (3.3) changes into the theoretical mean

µx =

∞∫−∞

x pX(x)dx (3.5)

and (3.4) into the theoretical variance

σ2x =

∞∫−∞

(x − µx)2 pX(x)dx . (3.6)

As is seen, (3.5) and (3.6) simply reproduce the parameters of the densitypX(x). Again, there is no benefit. Nevertheless, the parameters µx and σ2

x,though unknown and inaccessible, are of basic importance as they representthe expectations of the empirical estimators x and s2

x.In the following, we shall resort to expectations many a time, so that

it suggests itself to introduce a convenient symbolism. Instead of (3.5) and(3.6), we shall write

µx = EX and σ2x = E(X − µx)2

3.1 One-Dimensional Normal Density 27

respectively. Here, E symbolizes the integration while X stands for theintegration variable. As has already been done, random variables will bedenoted by upper-case letters and their realizations by lower-case letters.1

The lesser the scattering of the measured values, the smaller σx and thenarrower the density pX(x). Conversely, the wider the scattering, the largerσx and the broader the density pX(x). In particular, given µx = 0 and σx = 1,the density (3.1) turns into the so-called standardized normal density N(0, 1).Figure 3.1 underscores the rapid convergence of pX(x), which gets well alongwith the experimental behavior of the data – the latter scatter rather to alesser extent: outside µx ± 2σx, only relatively few values are to be observed.After all, the tails of the normal density outside, say, µx ± 3σx, are scarcelysignificant. The actual situation depends, however, on the intrinsic propertiesof the particular measuring device.

Nevertheless, confidence intervals and testing of hypotheses just rely uponthe tails of distribution densities, which, in pragmatic terms, means that so-defined criteria are based on statistically robust and powerful assertions. Afterall, the tails of the densities rather play a conceptual role and scarcely appearto be verifiable – at least as long as measuring devices operate in statisticallystationary states and do not produce short-time operant systematic errors,so-called outliers, which leave the experimenters in doubt as to whether ornot a particular datum might still pertain to one of the tails of the density.

The result of a measurement may not depend significantly on the actualnumber of repeat measurements, which implies that outliers should not beallowed to play a critical role with regard to the positioning of estimators– though, in individual cases, it might well be recommendable to rely onsophisticated decision criteria. On the other hand, judging from an experi-mental point of view, it could be more advantageous identifying the causesof outliers.

When we assume that the consecutive numbers x1, x2, . . . , xn of a se-quence of repeat measurements are independent, we impute that the mea-suring device relaxes before the next measured value is read. Given the mea-suring device complies with this requirement, we may derive the probabilitydensity of the random variable X corresponding to the arithmetic mean xas follows. To begin with, we generalize our actual notion to consider justone random variable X producing data x1, x2, . . . , xn and conceive, instead,of an ensemble of statistically equivalent series of measurements. To the first“vertical sequence”

x(1)1 , x

(2)1 , . . . , x

(k)1 , . . .

of this ensemble

1Using Greek letters, however, we shall not explicitly distinguish random vari-ables and their realizations.


x(1)1 , x

(1)2 , x

(1)3 , . . . , x

(1)n first series

x(2)1 , x

(2)2 , x

(2)3 , . . . , x

(2)n second series

. . . . . . . . . . . . . . . . . .

x(k)1 , x

(k)2 , x

(k)3 , . . . , x

(k)n k-th series

. . . . . . . . . . . . . . . . . .

we assign a random variable X1, to the second a random variable X2, etc.and, finally, to the n-th a random variable Xn. Then, considering the so-defined variables to be independent and normally distributed, by means ofany real constants b1, b2, . . . , bn, we may define a sum

Z = b1X1 + b2X2 + · · · + bnXn . (3.7)

As may be shown [40], see Appendix C, the random variable Z is normallydistributed. Putting

µi = EXi and σ2i = E(Xi − µi)2 ; i = 1, ..., n

its expectation and scattering are given by

µz =n∑

l=1

bl µl and σ2z =

n∑l=1

b2l σ2

l (3.8)

respectively. Its distribution density turns out to be

pZ(z) =1

σz

√2π

exp(− (z − µz)2

2σ2z

). (3.9)

As the Xl; l = 1, . . . , n are taken from one and the same parent distribution,we have

µ1 = µ2 = · · · = µn = µx and σ21 = σ2

2 = · · · = σ2n = σ2

x .

Hence, setting bl = 1/n; l = 1, . . . , n, the sum Z changes into X, µz into µx

and, finally, σ2z into σ2

x/n. Consequently, the probability of finding a realiza-tion of X within an interval x, . . . , x + dx is given by

pX(x)dx =1

(σx/√

n)√

2πexp(− (x − µx)2

2σ2x/n

)dx . (3.10)

In the following, we shall refer to the density pX(x) of the arithmetic meanX many a time. Error propagation, to be developed in Chap. 6, will be basedon linear sums of the kind (3.7), though, admittedly, we shall have to extendthe presumption underlying (3.10), as the sum Z turns out to be normallydistributed even if the variables Xl; l = 1, . . . , n prove to be dependent, see

3.2 Multidimensional Normal Density 29

(3.29). Let a final remark be made on the coherence between the level ofprobability and the range of integration,

P =

a∫−a

pX(x)dx .

Setting a = 2σx, we have

0.95 ≈2σx∫

−2σx

pX(x)dx , (3.11)

i.e. P ≈ 95% .

3.2 Multidimensional Normal Density

We consider several series of repeat measurements

x1, x2, . . . , xn

y1, y2, . . . , yn

. . . . . . . . . . . .

(3.12)

each comprising the same number n of items. We assign random variablesX, Y, . . . to the series and assume that they refer to different normal parentdistributions.

The joint treatment of several series of measurements implies the intro-duction of covariances. Again, we have to distinguish between empirical andtheoretical quantities. While the former pertain to series of measurements offinite lengths, the latter are defined through theoretical distribution densities.

Let us consider just two series of measurements xl, yl, . . .; l = 1, . . . , n. Bydefinition,

sxy =1

n − 1

n∑l=1

(xl − x)(yl − y) (3.13)

designates the empirical covariance and

rxy =sxy

sxsy; |rxy| ≤ 1

the empirical correlation coefficient. The quantities sx and sy denote theempirical standard deviations, i.e. the square roots of the empirical variances

s2x =

1n − 1

n∑l=1

(xl − x)2 and s2y =

1n − 1

n∑l=1

(yl − y)2 . (3.14)


While the expectation of (3.13) leads to the theoretical covariance

σxy = ESxy , (3.15)

we associate a theoretical correlation coefficient of the form

xy =σxy

σxσy; |xy| ≤ 1

with the empirical quantity rxy. The definitions of σxy and xy are formal,as they refer to experimentally inaccessible quantities. It proves convenientto assemble the estimators s2

x ≡ sxx, sxy = syx, s2y ≡ syy and the parameters

σ2x ≡ σxx, σxy = σyx, σ2

y ≡ σyy in variance–covariance matrices

s =

(sxx sxy

syx syy

)and σ =

(σxx σxy

σyx σyy

)(3.16)

respectively, where it is assumed that they will be non-singular. In this case,they are even positive definite, see Appendix B.

Given that the random variables X and Y are normally distributed, theprobability of finding a pair of realizations in any rectangle x, . . . , x + dx;y, . . . , y + dy follows from

pXY (x, y)dxdy =1

2π|σ|1/2exp(− 1

2|σ|[σ2

y(x − µx)2 (3.17)

−2σxy(x − µx)(y − µy) + σ2x(y − µy)2

])dxdy ,

where µx and µy designate the expectations EX and EY respectively,and |σ| the determinant of the theoretical variance–covariance matrix σ. Bymeans of the vectors

ζ =

(x

y

)and µ =

(µx

µy

),

the quadratic form of the exponent in (3.17) may be written as

q =1|σ|[σ2

y(x − µx)2 − 2σxy(x − µx)(y − µy) + σ2x(y − µy)2

]= (ζ − µ)Tσ−1(ζ − µ) .

The symbol “T” which is used, where necessary, as a superscript of vectorsand matrices denotes transposes. Evidently, the density (3.17) reproduces thetheoretical covariance

3.2 Multidimensional Normal Density 31

σxy = E (X − µx)(Y − µy)

=

∞∫−∞

∞∫−∞

(x − µx)(y − µy)pXY (x, y)dxdy . (3.18)

The same applies to the parameters σ2x and σ2

y . If σxy vanishes, the series ofmeasurements are called uncorrelated. In this situation, the conventional er-ror calculus tacitly ignores empirical covariances. It is interesting to note that,as long as experimenters refer to unequal numbers of repeat measurements,which is common practice, empirical covariances cannot be formalized any-way. We shall resume the question of how to interpret and handle empiricalcovariances in due course.

Given that the series of measurements (3.12) are independent, the densitypXY (x, y) factorizes and σxy vanishes, which means the two data series areuncorrelated. Conversely, it is easy to give examples proving that uncorre-lated series of measurements may well be dependent [43, 50]. However, withrespect to normally distributed data series, a vanishing covariance automat-ically entails the independence of the pertaining random variables.

Considering m random variables X1, X2, . . . , Xm, the theoretical vari-ances and covariances are given by

σij = E (Xi − µi)(Xj − µj) ; i, j = 1 , . . . , m , (3.19)

which, again, we assemble in a variance–covariance matrix

σ =

⎛⎜⎜⎜⎝

σ11 σ12 . . . σ1m

σ21 σ22 . . . σ2m

. . . . . . . . . . . .

σm1 σm2 . . . σmm

⎞⎟⎟⎟⎠ ; σii ≡ σ2

i ; σij = σji . (3.20)

If rank(σ) = m, the latter is positive definite. If rank(σ) < m, it is positivesemi-definite, see Appendix B.

Now, let x1, x2, . . . , xm denote any realizations of the random variablesX1, X2, . . . , Xm and let µ1, µ2, . . . , µm designate the respective expectations.Then, defining the column vectors

x = (x1 x2 · · · xm)T and µ = (µ1 µ2 · · · µm)T (3.21)

the associated m-dimensional normal probability density is given by

pX(x) =1

(2π)m/2|σ|1/2exp(−1

2(x − µ)Tσ−1(x − µ)

), (3.22)

where we assume rank(σ) = m.Let us now consider the random variables X and Y . We put down their

theoretical variances


σxx ≡ σ2x =

σ2x

nand σyy ≡ σ2

y =σ2

y

n, (3.23)

their theoretical covarianceσxy =

σxy

n(3.24)

and, finally, their theoretical correlation coefficient

xy =σxy

σxσy=

σxy

σxσy= xy .

After all, the joint distribution density is given by

pXY (x, y) =1

2π|σ|1/2exp(−1

2(ζ − µ)Tσ−1(ζ − µ)

), (3.25)

where

ζ =

(x

y

)and σ =

(σxx σxy

σyx σyy

).

The quadratic form in the exponent of (3.25),

q = (ζ − µ)Tσ−1(ζ − µ)

=1|σ|[σyy(x − µx)2 − 2σxy(x − µx)(y − µy) + σxx(y − µy)2

]=

n

|σ|[σ2

y(x − µx)2 − 2σxy(x − µx)(y − µy) + σ2x(y − µy)2

](3.26)

defines ellipses q = const. Finally, we add the joint density of the variablesX1, X2, . . . , Xm,

pX(x) =1

(2π)m/2|σ|1/2exp(−1

2(x − µ)Tσ−1(x − µ)

)(3.27)

and liken the quadratic forms in the exponents of (3.22) and (3.27),

q = (x − µ)Tσ−1(x − µ) and q = (x − µ)Tσ−1(x − µ) . (3.28)

Remarkably enough, both are χ2(m) distributed, see Sect. 3.3. From q weshall derive the variable of Hotelling’s density in due course.

Finally, we generalize (3.7), admitting dependence between the variables.This time, however, we shall consider m random variables instead of n, sothat

Z = b1X1 + b2X2 + · · · + bmXm . (3.29)

Let the Xi come from the density (3.22). Then, as is shown e.g. in [40], seealso Appendix C, the variable Z is still normally distributed. Its expectationand its variance are given by

3.3 Chi-Square Density and F Density 33

µz = EZ =m∑

i=1

bi µi and σ2z = E(Z − µz)2 = bTσb , (3.30)

respectively, where b = (b1 b2 . . . bm)T. In (3.11) we realized that an intervalof length 4 σx is related to a probability level of about 95%. Yet, underrelatively simple circumstances, the number of variables involved may easilybe 20 or even more. Let us set m = 20 and ask about the probability offinding realizations x1, x2, . . . , xm of the random variables X1, X2, . . . , Xm

within a hypercube symmetrical about zero and having edges of 4σx sidelength. For simplicity, we assume the random variables to be independent.This probability is given by

P ≈ 0.9520 ≈ 0.36 . (3.31)

Not until we extend the side lengths of the edges of the hypercube up to 6 σx

would we come back to P ≈ 95%.

3.3 Chi-Square Density and F Density

The chi-square or χ2 density covers the statistical properties of a sum of inde-pendent, normally distributed, standardized and squared random variables.Let X be normally distributed with EX = µx and E(X − µx)2 = σ2

x.Then, obviously, the quantity

Y =X − µx

σx(3.32)

is normal and standardized, as EY = 0 and EY 2 = 1. Transferred ton independent random variables X1, X2, . . . , Xn, the probability of finding arealization of the sum variable

χ2(n) =n∑

l=1

[Xl − µx

σx

]2(3.33)

within any interval χ2, . . . , χ2 + dχ2 is given by

pχ2(χ2, n)dχ2 =1

2n/2Γ(n/2)(χ2)n/2−1 exp

(−χ2

2

)dχ2 ; (3.34)

pχ2(χ2, n) is called the χ2 density. As the parameter µx is experimentallyinaccessible, (3.33) and (3.34) should be modified. This is, in fact, possible.Substituting X for µx changes (3.33) into

χ2(n − 1) =n∑

l=1

[Xl − X

σx

]2=

n − 1σ2

x

S2x (3.35)


and (3.34) into

pχ2(χ2, ν)dχ2 =1

2ν/2Γ(ν/2)(χ2)ν/2−1 exp

(−χ2

2

)dχ2 , (3.36)

letting ν = n− 1. The number ν specifies the degrees of freedom. The expec-tation of χ2 is given by

ν =

∞∫0

χ2 pχ2(χ2, ν)dχ2 (3.37)

while for the variance we find

2ν =

∞∫0

(χ2 − ν)2 pχ2(χ2, ν)dχ2 . (3.38)

The χ2 density will be used to illustrate the idea of what has called well-defined measuring conditions. To this end, let us consider the random variabledefined in (3.29),

Z = b1X1 + b2X2 + · · · + bmXm , (3.39)

and n of its realizations

z(l) = b1x(l)1 + b2x

(l)2 + · · · + bmx(l)

m ; l = 1, 2 , . . . , n .

The sequence z(l); l = 1, 2, . . . , n leads us to the mean and to the empiricalvariance,

z =1n

n∑l=1

z(l) and s2z =

1n − 1

n∑l=1

(z(l) − z)2 (3.40)

respectively. As the z(l) are independent,2 we are in a position to define

χ2(n − 1) =(n − 1)S2

z

σ2z

, (3.41)

where σ2z = E(Z − µz)2. If m = 2, then, as

z = b1x1 + b2x2 and z(l) − z = b1(x(l)1 − x1) + b2(x

(l)2 − x2)

we haves2

z = b21s

21 + 2b1b2s12 + b2

2s22 . (3.42)

2Even if the Xi; i = 1, . . . , m are dependent, the realizations z(l), l = 1, . . . , n ofthe random variable Z are independent. This is obvious, as the realizations x

(l)i of

Xi, with i fixed and l running, are independent. The same applies to the realizationsof the other variables Xj ; j = 1, . . . , m; j = i.

3.3 Chi-Square Density and F Density 35

Obviously, the χ2 as defined in (3.41) presupposes the same number ofrepeat measurements and includes the empirical covariance s12 – whether ornot σ12 = 0 is of no relevance.

In error propagation, we generally consider the link-up of several inde-pendent series of measurements. Then, although there are no theoretical co-variances, we shall nevertheless consider their empirical counterparts (which,in general, will not be zero). If we did not consider the latter, we would abol-ish basic statistical relationships, namely the beneficial properties of χ2 asexemplified through (3.41) and (3.42).

Let us return to the quadratic forms q and q quoted in (3.28). Assumingrank(σ) = m, the matrix σ is positive definite so that a decomposition

σ = cTc (3.43)

is defined. As σ−1 = c−1(cT)−1, we find

q = (x − µ)Tc−1(cT)−1

(x − µ) = yTy =m∑

i=1

y2i , (3.44)

wherey = (cT)−1(x − µ) = (y1 y2 · · · ym)T . (3.45)

Obviously, the components of the auxiliary vector y are normally distributed.Moreover, because of Ey = 0 and Ey yT = I, where I denotes an(m × m) identity matrix, they are standardized and independent, i.e. thequadratic form q is χ2(m) distributed. The same applies to the quadraticform q.

The F density considers two normally distributed, independent randomvariables X1 and X2. Let S2

1 and S22 denote their empirical variances implying

n1 and n2 repeat measurements respectively. Then, the probability of findinga realization of the random variable

F =S2

1

S22

=χ2(ν1)/ν1

χ2(ν2)/ν2; ν1 = n1 − 1 , ν2 = n2 − 1 (3.46)

in any interval f, . . . , f + df is given by

pF (f ; ν1, ν2)df =[ν1

ν2

]ν1/2 Γ[(ν1 + ν2)/2]Γ(ν1/2)Γ(ν2/2)

fν1/2−1

[1 + (ν1/ν2)f ](ν1+ν2)/2df .

(3.47)

Obviously, the symbol f denoting the realizations of the random variable Fconflicts with the symbol we have introduced for systematic errors. On theother hand, we shall refer to the F density only once, namely when we convertits quantiles into those of Hotelling’s density.


3.4 Student’s (Gosset’s) Density

The one-dimensional normal density is based on two inaccessible parameters,namely µx and σx. This inspired W.S. Gosset to derive a new density whichbetter suits experimental purposes [2]. Gosset’s variable has the form

T =X − µx

Sx/√

n, (3.48)

where X denotes the arithmetic mean and Sx the empirical standard devia-tion of n independent, normally distributed repeated measurements,

X =1n

n∑l=1

Xl and Sx =

√√√√ 1n − 1

n∑l=1

(Xl − X)2 .

Besides (3.48), we also consider the variable

T ′ =X − µx

Sx. (3.49)

Inserting (3.35) in (3.48) and (3.49), we find

T =

X − µx

σx/√

n

Sx/√

n

σx/√

n

=

X − µx

σx/√

n√χ2(n − 1)

n − 1

and T ′ =

X − µx

σx

Sx

σx

=

X − µx

σx√χ2(n − 1)

n − 1(3.50)

respectively. The numerators are standardized, normally distributed randomvariables while the common denominator displays the square root of a χ2

variable divided by ν = n − 1, the number of degrees of freedom.The realizations of the variables T and T ′ lie within −∞, . . . ,∞. The

probability of finding a realization of T in any interval t, . . . , t+dt is given by

p T (t; ν) dt =Γ[(ν + 1)/2]√

νπΓ(ν/2)

(1 +

t2

ν

)−(ν+1)/2

dt ; ν = n − 1 ≥ 1 , (3.51)

pT (t; ν) is called the t-density or Student’s density. The latter designation isdue to Gosset himself who published his considerations under the pseudonymStudent. The density (3.51) also applies to the variable T ′, see Appendix G.

On the right hand side, we may replace ν by (n − 1),

=Γ(n/2)√

(n − 1)πΓ[(n − 1)/2]

(1 +

t2

n − 1

)−n/2

dt .

3.4 Student’s (Gosset’s) Density 37

Fig. 3.2. Student’s density for ν = 2, 9,∞

This latter form proves useful when comparing the t-density with its multi-dimensional analogue, which is Hotelling’s density.

Remarkably enough, the number of degrees of freedom, ν = n − 1, is theonly parameter of Gosset’s density. Its graphic representation is broader butsimilar to that of the normal density. In particular, it is symmetrical aboutzero so that the expectation vanishes,

ET =

∞∫−∞

t pT (t; ν) dt = 0 . (3.52)

For the variance of T we have

ET 2 =

∞∫−∞

t2 pT (t; ν) dt =ν

ν − 2. (3.53)

The larger ν, the narrower the density, but its variance will never fall be-low 1. Figure 3.2 compares Student’s density for some values of ν with thestandardized normal density. The the standardized normal density coincideswith Student’s density as ν → ∞. Student’s density is broader than thestandardized normal density, taking into account that the former impliesless information than the latter. Moreover, we realize that the greater ν, thesmaller the importance of this difference.

The probability of finding any realization of T within an interval of finitelength, say, −tP , . . . , tP is given by

P−tP ≤ T ≤ tP =

tP∫−tP

pT (t; ν) dt . (3.54)

Let us once more revert to the meaning of what has been called well-definedmeasuring conditions. To this end, we consider the random variable


Z = b1X1 + b2X2 + · · · + bmXm ,

quoted in (3.39). Assuming that for each of the variables Xi; EXi = µi;i = 1, . . . , m there are n repeat measurements, we have

Z =1n

n∑l=1

Z(l) , µz = EZ =m∑

i=1

bi µi , S2z =

1n − 1

n∑l=1

(Z(l) − Z)2 .

Now, let us define the variable

T (ν) =Z − µz

Sz/√

n. (3.55)

Obviously, the quantity S2z implies the empirical covariances between the m

variables Xi; i = 1, . . . , m be they dependent or not. We have

z(l) = b1x(l)1 + b2x

(l)2 + · · · + bmx(l)

m , z = b1x1 + b2x2 + · · · + bmxm

so that

s2z =

1n − 1

n∑l=1

[b1(x

(l)1 − x1) + b2(x

(l)2 − x2) + · · · + bm(x(l)

m − xm)]2

= bTs b . (3.56)

The matrix notation on the right hand side refers to the empirical variance–covariance matrix s of the input data,

s =

⎛⎜⎜⎜⎝

s11 s12 . . . s1m

s21 s22 . . . s2m

. . . . . . . . . . . .

sm1 sm2 . . . smm

⎞⎟⎟⎟⎠ ; sij =

1n − 1

n∑l=1

(x(l)i − xi)(x

(l)j − xj) (3.57)

and to an auxiliary vector b of the coefficients bi; i = 1, . . . , m,

b = (b1 b2 · · · bm)T . (3.58)

As mentioned above, empirical covariances are at present treated as if theywere bothersome even if the measured data appear in pairs or triples, etc. Aslong as it is understood that there will be no theoretical covariances, no mean-ing at all is assigned to their empirical counterparts. Nevertheless, in Sect. 3.3we have realized that normally distributed measured data are embedded in acoherent complex of statistical statements whose self-consistency should notbe called into question. We shall quantitatively deploy these thoughts in thefollowing section.

After all, the question as to whether or not empirical covariances bearinformation is reduced to the question of whether or not the model of jointlynormally distributed measurement data is acknowledged. Here, empirical co-variances are considered essential as they preserve the framework of statisticsand allow concise and safe statistical inferences.

3.4 Student’s (Gosset’s) Density 39

Examples

Let us ask whether or not the difference between two arithmetic means ob-tained from different laboratories and relating to one and the same physicalquantity might be significant [39] (Vol. II, p. 146, The problem of two means).

Let us initially summarize the traditional arguments and refer to differentnumbers of repeat measurements,

x1 =1n

n1∑l=1

x(l)1 ; x2 =

1n

n2∑l=1

x(l)2 .

Then, inevitably, the empirical covariance is to be ignored. After this, werefer to equal numbers of repeat measurements and consider the empiricalcovariance.

(i) For simplicity, let us disregard systematic errors; consequently we mayset µi = µ; i = 1, 2. Furthermore, we shall assume equal theoretical variancesso that σ2

i = σ2; i = 1, 2. Hence, the theoretical variance of the randomvariable Z = X1 − X2 is given by

σ2z =

σ21

n1+

σ22

n2= σ2 n1 + n2

n1 n2.

Z obviously follows a

N

(0, σ

√n1 + n2

n1 n2

)density. The associated standardized variable is

(X1 − X2)

σ

√n1 + n2

n1 n2

.

According to (3.50) we tentatively put [38]

T =

X1 − X2

σ√

(n1 + n2)/(n1 n2)√χ2(ν∗)

ν∗

(3.59)

attributing degrees of freedom

ν∗ = n1 + n2 − 2

to the sum of the two χ2-s ,

χ2(ν∗) =(n1 − 1)S2

1 + (n2 − 1)S22

σ2.


After all, we consider the random variable

T (ν∗) =

X1 − X2

σ√

(n1 + n2)/(n1 n2)√(n1 − 1)S2

1 + (n2 − 1)S22

σ2

√n1 + n2 − 2 (3.60)

to be t-distributed.In contrast to this, from (3.55), which is based on n1 = n2 = n, we simply

deduce

T (ν) =X1 − X2√

S21 − 2S12 + S2

2

√n . (3.61)

Clearly, this result is also exact. To compare (3.60) with (3.61), we put n1 =n2 so that (3.60) changes into

T (ν∗) =X1 − X2√S2

1 + S22

√n . (3.62)

While this latter T lacks the empirical covariance S12 and refers to degrees offreedom ν∗ = 2(n − 1), the T defined in (3.61) relates to degrees of freedomν = (n−1), where, of course, we have to consider tP (2ν) < tP (ν). Remarkablyenough, both quantities possess the same statistical properties, namely

|X1 − X2| ≤ tP (2ν)√

S21 + S2

2/√

n

and

|X1 − X2| ≤ tP (ν)√

S21 − 2S12 + S2

2/√

n

respectively.(ii) To be more general, we now admit different expectations, µ1 = µ2,

and different theoretical variances, σ21 = σ2

2 , briefly touching the so-calledFisher–Behrens problem. A solution akin to that presented above does notseem to exist.

Traditionally, the Fisher–Behrens problem refers to n1 = n2. Its impli-cations are extensively discussed in [39] and will therefore not be dealt withhere.3

B.L. Welch [7] has handled the Fisher–Behrens problem by means of arandom variable of the kind

3We wish to mention that, from a metrological point of view, the mere notationµ1 = µ2 conceals the underlying physical situation. The two means x1 and x2 mayhave been determined by means of two different measuring devices, likewise aimingat one and the same physical object, or, alternatively, there might have been twodifferent physical objects and one measuring device or even two different devices.

3.5 Fisher’s Density 41

ω =(X1 − X2) − (µ1 − µ2)√

S21/n1 + S2

2/n2

, n1 = n2 .

This quantity is approximately t-distributed if related to a so-called effectivenumber of degrees of freedom

νeff =(s2

1/n1 + s22/n2)2

(s21/n1)2/(n1 + 1) + (s2

2/n2)2/(n2 + 1)− 2 ,

i.e. ω(νeff) leads us back to Student’s statistics. Let us recall that both Fisherand Welch considered n1 = n2 a reasonable premise.

Presupposing n1 = n2, our approach doubtlessly yields an exact

T =(X1 − X2) − (µ1 − µ2)√

S21 − 2S12 + S2

2

√n . (3.63)

After all, when sticking to well-defined measuring conditions bringing theempirical covariance S12 to bear the Fisher–Behrens problem is reduced toan ill-defined issue – judged from the kind of statistical reasoning discussedhere.

Hotelling [5] has developed a multidimensional generalization of the one-dimensional Student’s density. Hotelling’s density refers to, say, m arithmeticmeans each of which implies the same number n of repeat measurements.Again, this latter condition leads us to consider the empirical covariances ofthe m variables.

Traditionally, Hotelling’s variable is designated by T and its realizations,correspondingly, by t. Nevertheless, the respective relationship will disclose byitself whether Student’s or Hotelling’s variable is meant: the former possessesone, the latter two arguments:

t(ν) Student’s variable ,

t(m, n) Hotelling’s variable .

3.5 Fisher’s Density

We consider two measurands x and y, assign normally distributed randomvariables X and Y to them and assume well-defined measuring conditions,i.e. equal numbers of repeat measurements

x1, x2, . . . , xn ; y1, y2, . . . , yn .

Then, Fisher’s density specifies the joint statistical behavior of the five ran-dom variables

x, y, s2x, sxy, s

2y ,


the two arithmetic means, the two empirical variances and the empirical co-variance. Fisher’s density implies a metrologically important statement withrespect to the empirical covariance of uncorrelated series of measurements.The latter is, as a rule, classified as unimportant if its expectation vanishes.As we shall see, the empirical covariance will nevertheless have a meaningwhich, ultimately, is rooted in the structure of Fisher’s density.

Nonetheless, the present situation is this: Given several measurement re-sults obtained by different laboratories are to be linked with one another,empirical covariances are simply disregarded. On the one hand, this seemsto be a matter of habit, on the other hand the original measurement datamight not have been preserved and, if, experimenters have carried out dif-ferent numbers of repeat measurements, which would, a priori rule out thepossibility of considering empirical covariances.

In the following we refer to the notation and definitions of Sect. 3.2.Fisher’s density

p(x, y, s2x, sxy, s2

y) = p1(x, y) p2(s2x, sxy, s

2y) (3.64)

factorizes into the density p1 of the two arithmetic means and the density p2

of the empirical moments of second order. Remarkably enough, the latter,

p2(s2x, sxy, s

2y) =

(n − 1)n−1

4πΓ(n − 2)|σ|(n−1)/2[s2

xs2y − s2

xy](n−4)/2

× exp(−n − 1

2|σ| h(s2x, sxy, s

2y))

;

h(s2x, sxy, s

2y) = σ2

ys2x − 2σxysxy + σ2

xs2y , (3.65)

does not factorize even if σxy = 0. From this we deduce that the modelof jointly normally distributed random variables considers the empirical co-variance sxy a statistically indispensable quantity. Finally, when propagatingrandom errors taking empirical covariances into account, we shall be placedin a position to define confidence intervals according to Student, irrespectiveof whether or not the series of measurements are dependent – the formalism(3.55) will always be correct.

In the same way, Hotelling’s density, see Sect. 3.6, will provide experimen-tally meaningful confidence ellipsoids.

In (3.65), we may substitute the empirical correlation coefficient rxy =sxy/(sxsy) for the empirical covariance sxy so that

p2(s2x, rxy, s2

y) =(n − 1)n−1

4πΓ(n − 2)|σ|(n−1)/2sn−3

x sn−3y [1 − r2

xy](n−4)/2

× exp(−n − 1

2|σ| h(s2x, rxy, s

2y))

;

h(s2x, rxy, s2

y) = σ2ys2

x − 2σxyrxysxsy + σ2xs2

y . (3.66)

3.6 Hotelling’s Density 43

If σxy vanishes, the empirical correlation coefficient rxy and the two empir-ical variances s2

x, s2y obviously turn out to be statistically independent. This

situation is, however, quite different from that in which we considered the em-pirical covariance and the empirical variances. Meanwhile, the error calculusrelates to density (3.65) and not to density (3.66). We conclude:

Within the framework of normally distributed data the density of the em-pirical moments of second order requires us to consider empirical covariances,irrespective of whether or not the data series to be concatenated are depen-dent.

Apparently, we are now confronted with another problem. Independent se-ries of measurements allow the pairing of the measurement data to be altered.For instance, the sequences (x1, y1); (x2, y2); . . . and (x1, y2); (x2, y1); . . . wouldbe equally valid. While such permutations in no way affect empirical vari-ances, they do, however, affect the empirical covariance (3.13). In general,each new pairing of data yields a numerically different empirical covarianceimplying, of course, varying lengths of the pertaining confidence intervalsto be deduced from (3.55). This might appear confusing, nevertheless thedistribution densities (3.51) and (3.65) willingly accept any such empiricalcovariance, be it the direct result of measurements or due to permutationsin data pairing. In particular, each covariance is equally well covered by theprobability statement resulting from (3.51).

On this note, we could even purposefully manipulate the pairing of datain order to arrive at a particular, notably advantageous empirical covariancewhich, e.g., minimizes the length of a confidence interval.

Should, however, a dependence between the series of measurements exist,the experimentally established pairing of data is to be considered statisticallysignificant and no permutations are admissible, i.e. subsequent permutationsabolishing the original pairing are to be considered strictly improper.

Wishart generalized Fisher’s density to more than two random variables[38]. The considerations presented above apply analogously.

Let us return to the problem of two means, X1 and X2, see Sect. 3.4.Assuming σ2

1 = σ22 = σ2 we considered the difference Z = X1−X2. Obviously,

the variable Z relates to a two-dimensional probability density. Even if S21

and S22 refer to independent measurements, the model (3.65) formally asks us

to complete the empirical variances through the empirical covariance S12 asthe empirical moments of second order appear to be statistically dependent.In doing so, statistics tells us that any empirical covariance producible outof the input data is likewise admitted.

3.6 Hotelling’s Density

Hotelling’s density is the multidimensional analogue of Student’s density andpresupposes the existence of the density (3.65) of the empirical moments of


second order. To represent Hotelling’s density, we refer (3.25). As has beenstated, the exponent

q = (ζ − µ)Tσ−1(ζ − µ) (3.67)

=1|σ|[σyy(x − µx)2 − 2σxy(x − µx)(y − µy) + σxx(y − µy)2

]is χ2(2)-distributed. Insertion of

σ = σ/n , σ−1 = nσ−1 , |σ| = |σ|/n2 (3.68)

has led to

q = n(ζ − µ)Tσ−1(ζ − µ) (3.69)

=n

|σ|[σyy(x − µx)2 − 2σxy(x − µx)(y − µy) + σxx(y − µy)2

].

Hotelling substituted the empirical variance–covariance matrix

s =

(sxx sxy

syx syy

)(3.70)

for the unknown, inaccessible theoretical variance–covariance matrix σ sothat q formally turned into

t2(2, n) = n(ζ − µ)Ts−1(ζ − µ) (3.71)

=n

|s|[syy(x − µx)2 − 2sxy(x − µx)(y − µy) + sxx(y − µy)2

].

Finally, Hotelling confined the sxx, sxy, syy combinations to the inequality

−sxsy < sxy < sxsy ;(s2

x ≡ sxx , s2y ≡ syy

)thus rendering the empirical variance–covariance matrix s positive definite.Likewise, χ2(2) = const. and t2(2, n) = const. depict ellipses but only thelatter, Hotelling’s ellipses, are free from unknown theoretical parameters, i.e.they do not rely on the theoretical variances and the theoretical covariance.This is a first remarkable property of Hotelling’s density. Another property isthat t2(2, n) equally applies to dependent and independent series of measure-ments. Consequently, the experimenter may refer to t2(2, n) without havingto know or discover whether or not there are dependencies between the un-derlying series of measurements. With respect to m measurands, we alter thenomenclature substituting x1, x2, . . . , xm for x, y so that a vector

(x − µ) = (x1 − µ1 x2 − µ2 · · · xm − µm)T (3.72)

is obtained in place of ζ − µ = (x − µx y − µy)T. Thus


t2(m, n) = n(x − µ)Ts−1(x − µ) (3.73)

is the multidimensional analogue of (3.71), where

s =

⎛⎜⎜⎜⎝

s11 s12 . . . s1m

s21 s22 . . . s2m

. . . . . . . . . . . .

sm1 sm2 . . . smm

⎞⎟⎟⎟⎠ ;

sij =1

n − 1

n∑l=1

(xil − xi)(xjl − xj) ; i, j = 1, . . . , m

(3.74)

denotes the empirical variance–covariance matrix of the measured data.Again, we shall assume s to be positive definite [38].

Formally, we again assign random variables X1, X2, . . . , Xm to the mmeans x1, x2, . . . , xm. In like manner, we understand the empirical variance–covariance matrix s, established through the measurements, to be a particu-lar realization of a random matrix S. Hotelling [5] has shown that free fromtheoretical variances and covariances, a density of the form

pT (t; m, n) =2Γ(n/2)

(n − 1)m/2Γ[(n − m)/2]Γ(m/2)tm−1

[1 + t2/(n − 1)]n/2;

t > 0 , n > m (3.75)

specifies the probability of finding a realization of the random variable

T (m, n) =√

n(X − µ)TS−1(X − µ) (3.76)

in any interval t, . . . , t + dt. Accordingly, the probability of the event T ≤tP (m, n) is given by

PT ≤ tP (m, n) =

tP (m,n)∫0

pT (t; m, n) dt . (3.77)

Figure 3.3 illustrates pT (t; m, n) for m = 2 and n = 10. If m = 1, (3.75)changes into

pT (t; 1, n) =2Γ(n/2)

(n − 1)1/2Γ[(n − 1)/2]Γ(1/2)

(1 +

t2

n − 1

)−n/2

,

which may be compared with (3.51). Obviously, for m = 1, Hotelling’s densityis symmetric about t = 0 and, apart from a factor of 2 in the numerator,identical with Student’s density. The latter may be removed by extendingthe domain of pT (t; 1, n) from 0, . . . ,+∞ to −∞, . . . ,+∞, which is, however,admissible only if m = 1.


Fig. 3.3. Hotelling’s density; m = 2; n = 10

In order to practically utilize (3.76) and (3.77), we proceed just as we didwhen discussing Student’s density and wished to localize the unknown pa-rameter µ. Following (3.76), we center Hotelling’s ellipsoid t2(m, n) = const.in x and formally substitute an auxiliary vector x for the unknown vector µ.The so-defined confidence ellipsoid

t2P (m, n) = n(x − x)Ts−1(x − x) (3.78)

localizes the point defined by the unknown vector µ with probability PT ≤tP (m, n) as given in (3.77).

This latter interpretation obviously reveals the benefit of our considera-tions. As is known, the conventional error calculus gets confidence ellipsoidsfrom the exponent of the multidimensional normal probability density andthese, unfortunately, imply unknown theoretical variances and covariances.In contrast to this, Hotelling’s ellipsoid only needs empirical variances andcovariances. Let us again consider the ellipse defined in (3.71). To simplifythe notation, we set

α = syy , β = −sxy , γ = sxx , δ =|s|t2(2, n)

n

so that

α(x − µx)2 + 2β(x − µx)(y − µy) + γ(y − µy)2 = δ .

From the derivative

y′ = −α(x − µx) + β(y − µy)β(x − µx) + γ(y − µy)

we deduce

|x − µx|max =tP (2, n)√

n

√sxx and |y − µy|max =

tP (2, n)√n

√syy .


Fig. 3.4. Hotelling’s ellipse

Figure 3.4 puts straight that Hotelling’s ellipse is included in a rectanglewhose side lengths are independent of the empirical covariance sxy. In the caseof independent series of measurements, this means: while any permutationswithin the pairings

x1 , y1 ; x2 , y2 ; . . . ; xn , yn

do not affect the empirical variances, the empirical covariance may, however,assume new values, thus rotating Hotelling’s ellipse within the aforesaid fixedrectangle. All positions of the ellipse are equally admissible. Up to now, degen-erations had been formally excluded. However, admitting them for a moment,we are in a position to state:

The inequality

−sxsy ≤ sxy ≤ sxsy

(which here includes degenerations) reveals that, given that the two series ofmeasurements are independent, all (x−µx), (y−µy) pairs within the rectangleare tolerable.

This interpretation proves invertible, i.e. in terms of (3.78) we may justas well state: The point (µx, µy) defining the unknown vector µ may lieanywhere within a rectangle, centered in (x, y), and having the side lengths

2tP (2, n)√

n

√sxx and 2

tP (2, n)√n

√syy

respectively. We stress that, given that the series of measurement are de-pendent, we are not allowed to perform any interchanges within the jointlymeasured data pairs.

Apparently, a table of quantiles tP (m, n) of order P (tP (m, n)) accordingto (3.77) does not exist. Meanwhile, we can use the tabulated quantiles of


Fig. 3.5. Transformation of Hotelling’s density

the F density as there is a relationship between the latter and Hotelling’sdensity.4

We ask, given a random variable with probability density F (3.47),

pF (f ; ν1, ν2)df =[ν1

ν2

]ν1/2 Γ[(ν1 + ν2)/2]Γ(ν1/2)Γ(ν2/2)

fν1/2−1

[1 + (ν1/ν2)f ](ν1+ν2)/2df ,

what is the density pT (t) of the variable

t =√

ν1

ν2(n − 1)f . (3.79)

With a view to Fig. 3.5 we find [43]

pT (t) = pF

[ν2

ν1(n − 1)t2] ∣∣∣∣df

dt

∣∣∣∣ . (3.80)

Settingν1 = m ; ν2 = n − m ; ν1 + ν2 = n (3.81)

we easily reconstruct (3.75). Then, from the quantiles of the F density, weinfer via

P =

f∫0

pF (f ′; ν1, ν2) df ′ =

t∫0

pT (t′; m, n) dt′ (3.82)

the quantiles t of Hotelling’s density.

4As an exception, here we deviate from the nomenclature agreed on accordingto which f denotes a systematic error.


Example

Calculate Hotelling’s quantile of order P=95%, given m = 1 and n = 11. Asν1 = 1 and ν2 = 10, we have

0.95 =

f0.95∫0

pF (f ; 1, 10) df .

From the table of the F density we infer the quantile f0.95 = 4.96 so thatfrom (3.79) we obtain Hotelling’s quantile t0.95 = 2.2. As expected, this valueindeed agrees with Student’s quantile, since m = 1.

4 Estimators and Their Expectations

4.1 Statistical Ensembles

Expectations, as theoretical concepts, emerge from integrals and distribu-tion densities. Their experimental counterparts are so-called estimators. Thearithmetic mean and the empirical variance

x =1n

n∑l=1

xl , s2x =

1n − 1

n∑l=1

(xl − x)2 (4.1)

for example, are estimators of the expectations

µx = EX =

∞∫−∞

x pX(x)dx (4.2)

and

σ2x = E

(X − µx)2

=

∞∫−∞

(x − µx)2 pX(x) dx , (4.3)

where pX(x) and pX(x) denote the distribution densities of the random vari-ables X and X respectively. While (4.2) is quite obvious, statement (4.3)needs some explanations as the integrand holds a non-linear function of x,namely φ(x) = (x − µx)2, which is simply averaged by the density pX(x)of X .

Let X be a random variable with realizations x, and u = φ(x) a givenfunction. Then, as may be shown, see e.g. [43], the expectation of the randomvariable U = φ(X) is given by

EU =

∞∫−∞

u pU (u)du =

∞∫−∞

φ(x) pX(x) dx , (4.4)

where pU (u) denotes the density of the variable U and pX(x) that of thevariable X . Analogously, considering functions of several variables, e.g. V =φ(X, Y ), we may set

52 4 Estimators and Their Expectations

EV =

∞∫−∞

v pV (v) dv =

∞∫−∞

∞∫−∞

φ(x, y) pXY (x, y) dxdy . (4.5)

Again, pV (v) constitutes the density of the random variable V and pXY (x, y)the joint density of the random variables X and Y . Thus, (4.4) and (4.5) saveus the trouble of developing the densities pU (u) and pV (v) when calculatingthe respective expectations.

In the following, we shall address the expectations of some frequentlyused empirical estimators. For this, we purposefully rely on a (conceived,fictitious) statistical ensemble of infinitely many identical measuring deviceswhich basically have similar statistical properties and, exactly, one and thesame unknown systematic error, Fig. 4.1. From each device, we bring in thesame number n of repeat measurements. In a formal sense, the collectivity ofthe random variables X(k); k = 1, 2, 3, . . .

X(k) ⇔ x(k)1 , x

(k)2 , . . . , x(k)

n ; k = 1, 2, 3 , . . . (4.6)

establishes a statistical ensemble enabling us to calculate expectations.A measuring device, which is used over a period of many years, may be

regarded as approximating an ensemble of the above kind – if only its inher-ent systematic errors remain constant in time. As has been pointed out, ex-perimenters are pleased to manage with less, namely with systematic errorsremaining constant during the time it takes to perform n repeat measure-ments so that the data remain free from drifts. Moreover, as variations ofsystematic errors cannot be excluded during prolonged periods of time, theirchanges should stay within the error bounds conclusively assigned to themfrom the outset.

Let us conceptualize these ideas. If, as presumed, all measuring deviceshave the same unknown systematic error f = const., the data are gatheredunder conditions of reproducibility. On the other hand, each of the experi-mental set-ups might also be affected by a specific unknown systematic error.Then the data would be recorded under conditions of comparability. The dis-tinction between reproducibility and comparability induces us to stress oncemore that reproducibility aims at quite a different thing than accuracy. Weshall encounter conditions of comparability in Sect. 12.2 when treating so-called key comparisons.

Obviously, apart from (4.6), the ensemble also implies random variablesof the kind Xl; l = 1, 2, . . . , n

Xl ⇔ x(1)l , x

(2)l , . . . , x

(k)l , . . . , l = 1, 2 , . . . , n . (4.7)

Formally, we now represent the arithmetic mean

x =1n

(x1 + x2 + · · · + xn)

4.1 Statistical Ensembles 53

Fig. 4.1. Statistical ensemble of series of repeated measurements, each series com-prising n measurements and being affected by one and the same systematic errorf = const.

by means of the n realizations of the random variable X(k) as defined in (4.6).For any k we have

x(k) =1n

(x

(k)1 + x

(k)2 + · · · + x(k)

n

).

Certainly, with respect to (4.7), we may just as well introduce the sum vari-able

X =1n

(X1 + X2 + · · · + Xn) . (4.8)


As assumed, the variables X1, X2, . . . , Xn are independent. We are now ina position to calculate the expectation of the arithmetic mean. Let pX(x)designate the density of the random variable X so that

EX =

∞∫−∞

x pX(x) dx .

Referring to (4.5) and (4.8), we find

EX =1n

∞∫−∞

· · ·∞∫

−∞

×(x1 + x2 + · · · + xn) pX1X2...Xn(x1, x2, . . . , xn) dx1 dx2 · · · dxn .

In each case, integrating out the “undesired variables” yields 1,

EX =1n

n∑l=1

∞∫−∞

xl pXl(xl) dxl = µx . (4.9)

After all, the expectation of X presents itself as the mean of the expectationsof the n random variables X1, X2, . . . , Xn, each with the same value µx.

4.2 Variances and Covariances

As we recall, the concatenation of the xl in x is linear by definition. Thecalculation of expectations proves to be more complex if the estimators areof a quadratic nature. Letting µx denote the expected value EX of thearithmetic mean X, we consider the empirical variance as defined through

s2x =

1n

n∑l=1

(xl − µx)2 . (4.10)

Replacing the quantities s2x and xl with the random variables S2

x and Xl

respectively, turns (4.10) into

S2x =

1n

n∑l=1

(Xl − µx)2 .

The expectation follows from

ES2x =

1n

E

n∑

l=1

(Xl − µx)2

=1n

n∑l=1

E(Xl − µx)2 .

4.2 Variances and Covariances 55

But as

E(Xl − µx)2 =

∞∫−∞

(xl − µx)2 pXl(xl) dx = σ2

x ,

we findES2

x = σ2x . (4.11)

Next we consider the scattering of X with respect to µx. Each of the devicessketched in Fig. 4.1 produces a specific mean x. What can we say about themean-square scattering of X relative to the parameter µx? We have

E(X − µx)2 = E

⎧⎨⎩(

1n

n∑l=1

Xl − µx

)2⎫⎬⎭

=1n2

E

⎧⎨⎩(

n∑l=1

(Xl − µx)

)2⎫⎬⎭ (4.12)

=1n2

[E(X1 − µx)2 + 2E(X1 − µx)(X2 − µx)

+ · · · + 2E(Xn−1 − µx)(Xn − µx) + E(Xn − µx)2].

As X1 and X2 are independent, the first covariance yields

E(X1 − µx)(X2 − µx) =

∞∫−∞

∞∫−∞

(x1 − µx)(x2 − µx)pX1X2(x1, x2) dx1 dx2

=

∞∫−∞

(x1 − µx)pX1(x1) dx1

∞∫−∞

(x2 − µx)pX2(x2) dx2 = 0

and the same applies to the other covariances. As each of the quadratic termsyields σ2

x we arrive at

E(X − µx)2

=

σ2x

n≡ σx .

The empirical variance

s2x =

1n − 1

n∑l=1

(xl − x)2 (4.13)

differs from that defined in (4.10) insofar as reference is taken here to x and1/(n− 1) has been substituted for 1/n. Rewriting (4.13) in random variablesand formalizing the expectation yields


ES2

x

=

1n − 1

E

n∑

l=1

(Xl − X)2

.

Due ton∑

l=1

(Xl − X)2 =n∑

l=1

[(Xl − µx) − (X − µx)

]2=

n∑l=1

(Xl − µx)2 − 2(X − µx)n∑

l=1

(Xl − µx) + n(X − µx)2

=n∑

l=1

(Xl − µx)2 − n(X − µx)2

we get

ES2

x

=

1n − 1

E

n∑

l=1

(Xl − µx)2 − n(X − µx)2

=1

n − 1

(nσ2

x − nσ2

x

n

)= σ2

x . (4.14)

Here again the independence of the measured values xl; l = 1, 2, . . . , n provesto be essential. Obviously, estimators (4.10) and (4.13) feature the same ex-pected value but only (4.13) can be realized experimentally.

When calculating E(X − µx)2, we have met the theoretical covarianceof the random variables X1 and X2. Evidently, this definition applies to anytwo random variables X and Y with expectations µx and µy. In general, wehave

E(X − µx)(Y − µy) =

∞∫−∞

∞∫−∞

(x − µx)(y − µy) pXY (x, y) dxdy = σxy .

(4.15)

Given X and Y are independent, pXY (x, y) = pX(x)pY (y) and σxy = 0.Before calculating other expectations we extend the conceived statistical

ensemble by a second random variable Y , i.e. with each random variable Xl

we associate a further random variable Yl. The consecutive pairs (Xl, Yl); l =1, . . . , n are assumed to be independent, however there may be a correlationbetween Xl and Yl. In particular, each Xl is independent of any other Yl′ ;l = l′.

We now consider the expected value of the empirical covariance

sxy =1n

n∑l=1

(xl − µx)(yl − µy) . (4.16)

4.2 Variances and Covariances 57

Letting Sxy denote the associated random variable, we look after the expec-tation

ESxy =1n

E

n∑

l=1

(Xl − µx)(Yl − µy)

.

Obviously

ESxy =1n

n∑l=1

E (Xl − µx)(Yl − µy) =1n

nσxy = σxy . (4.17)

More effort is needed to cover

E(X − µx)(Y − µy)

=

∞∫−∞

∞∫−∞

(x − µx)(y − µy) pXY (x, y) dx dy

= σxy . (4.18)

In contrast to (4.15), the expectation (4.18) relates to the less fluctuatingquantities X and Y . As

x − µx =1n

n∑l=1

xl − µx =1n

n∑l=1

(xl − µx)

y − µy =1n

n∑l=1

yl − µy =1n

n∑l=1

(yl − µy) ,

we find

E(X − µx)(Y − µy)

=

1n2

∞∫−∞

· · ·∞∫

−∞

n∑l=1

(xl − µx)n∑

l′=1

(yl′ − µy)

×pX1...Yn(x1, . . . , yn) dx1 . . . dyn ,

i.e.E(X − µx)(Y − µy)

=

σxy

n≡ σxy . (4.19)

Occasionally, apart from covariances, so-called correlation coefficients are ofinterest. The latter are defined through

xy =σxy

σxσyand xy =

σxy

σxσy, where xy ≡ xy . (4.20)

The covariances defined in (4.15) and (4.18) rely on experimentally inacces-sible parameters. Of practical importance is


sxy =1

n − 1

n∑l=1

(xl − x)(yl − y) . (4.21)

As the term within the curled brackets on the right hand side of

(n − 1)ESxy = E

n∑

l=1

(Xl − X)(Yl − Y )

passes into

n∑l=1

(Xl − X)(Yl − Y ) =n∑

l=1

[(Xl − µx) − (X − µx)

] [(Yl − µy) − (Y − µy)

]

=n∑

l=1

(Xl − µx)(Yl − µy) + n(X − µx)(Y − µy)

−(X − µx)n∑

l=1

(Yl − µy) − (Y − µy)n∑

l=1

(Xl − µx)

=n∑

l=1

(Xl − µx)(Yl − µy) − n(X − µx)(Y − µy) ,

we find

(n − 1)ESxy = nσxy − nσxy

n= (n − 1)σxy ,

so thatESxy = σxy . (4.22)

The empirical covariance (4.21) is unbiased with respect to the theoreticalcovariance σxy.

4.3 Elementary Model of Analysis of Variance

The concept of a conceived statistical ensemble helps us to explore the im-pact of unknown systematic errors on the analysis of variance. As we recall,the latter shall reveal whether or not series of measurements relating to thesame physical quantity but carried out at different laboratories are chargedby specific, i.e. different unknown systematic errors. But this, quite obviously,conflicts with the approach pursued here. Obviously, given the measured dataare biased, the tool analysis of variance does no longer appear to be mean-ingful:

The analysis of variance breaks down, given the empirical data to be in-vestigated are charged or biased by unknown systematic errors.

4.3 Elementary Model of Analysis of Variance 59

Let us assume a group of, say m, laboratories has measured one and thesame physical quantity, established by one and the same physical object,and that a decision is pending on whether or not the stated results may beconsidered compatible.

All laboratories have performed the same number n of repeat measure-ments, so there are means and variances of the form

xi =1n

n∑l=1

xil ; s2i =

1n − 1

n∑l=1

(xil − xi)2 ; i = 1, . . . , m . (4.23)

We assume the xil to be independent and normally distributed. With regardto the influence of unknown systematic errors, we admit that the expectationsµi of the arithmetic means Xi differ. The expected values of the empiricalvariances shall, however, be equal

EXi = µi , ES2i = σ2 ; i = 1, . . . , m . (4.24)

The two error equations

xil = x0 + (xil − µi) + fi , xi = x0 + (xi − µi) + fi

i = 1, . . . , m ; l = 1, . . . , n (4.25)

clearly conflict with the basic idea of analysis of variance. On the other hand,we want to show explicitly why, and how, this classical tool of data analysisbreaks down due to the empirical character of the data fed into it.

The analysis of variance defines a grand mean, summing over N = m×nmeasuring data xil,

x =1N

m∑i=1

n∑l=1

xil (4.26)

and an associated grand empirical variance

s2 =1

N − 1

m∑i=1

n∑l=1

(xil − x)2 . (4.27)

The latter is decomposed by means of the identity

xil = x + (xi − x) + (xil − xi) (4.28)

and the error equations (4.25). Thus, (4.27) turns into

(N − 1)s2 = nm∑

i=1

(xi − x)2 +m∑

i=1

n∑l=1

(xil − xi)2 (4.29)

as the mixed term vanishes,


2m∑

i=1

(xi − x)n∑

l=1

(xil − xi) = 0 .

If the systematic errors are zero, the right-hand side of (4.29) would offer thepossibility of defining empirical variances s2

1 and s22, namely

(m − 1)s21 = n

m∑i=1

(xi − x)2 and (N − m)s22 =

m∑i=1

n∑l=1

(xil − xi)2 (4.30)

so that(N − 1)s2 = (m − 1)s2

1 + (N − m)s22 . (4.31)

The expectations of both, S21 and S2

2 should be equal to the theoretical vari-ance σ2 defined in (4.24), i.e.

ES21 = σ2 and ES2

2 = σ2 .

Let us first consider the left expectation. As

xi =1n

n∑l=1

xil = x0 + (xi − µi) + fi ,

and

x =1N

m∑i=1

n∑l=1

xil =n

N

m∑i=1

[x0 + (xi − µi) + fi]

= x0 +n

N

m∑i=1

(xi − µi) +n

N

m∑i=1

fi ,

we have

xi − x =

⎡⎣(xi − µi) − n

N

m∑j=1

(xj − µj)

⎤⎦+

⎡⎣fi − n

N

m∑j=1

fj

⎤⎦ ,

so that

ES2

1

= σ2 +

n

(m − 1)

m∑i=1

⎡⎣fi − n

N

m∑j=1

fj

⎤⎦2

. (4.32)

The second expectation follows fromm∑

i=1

n∑l=1

(xil − xi)2 = (n − 1)m∑

i=1

s2i ,

so thatES2

2

= σ2 . (4.33)

The test ratio of the analysis of variance is the quotient S21/S2

2 , which shouldbe F distributed. On account of (4.32), however, any further considerationappears to be dispensable.

5 Combination of Measurement Errors

5.1 Expectation of the Arithmetic Mean

For convenience’s sake, we cast the basic error equation (2.3)

xl = x0 + εl + fx ; l = 1, 2, . . . , n

into the form of an identity

xl = x0 + (xl − µx) + fx ; l = 1, 2, . . . , n . (5.1)

Let us remind that the quantity xl is a physically measured value whilethe decomposition (5.1) reflects the error model under discussion. The sameapplies to the the arithmetic mean

x = x0 + (x − µx) + fx (5.2)

which, in fact, is a formal representation of

x =1n

n∑l=1

xl .

Let us assign a random variable X to x. Then, obviously, the expectation

µx = EX = x0 + fx (5.3)

puts straight:The unknown systematic error fx biases the expectation µx of the arith-

metic mean X with respect to the true value x0.In the following we shall design a confidence interval localizing the ex-

pectation µx of the arithmetic mean. To this end, we refer to the empiricalvariance

s2x =

1n − 1

n∑l=1

(xl − x)2 (5.4)

and to Student’s variable

T =X − µx

Sx/√

n. (5.5)

62 5 Combination of Measurement Errors

Fig. 5.1. Confidence interval for the expectation µx of the arithmetic mean X

As has been said in (3.54)

P−tp ≤ T ≤ tp =

tP∫−tp

pT (t; ν) dt ; ν = n − 1 (5.6)

specifies the probability of finding any realization t of T in an interval−tP , . . . , tP . Often, P = 0.95 is chosen. The probability “masses” lying tothe left and right of −tP and tP respectively, are traditionally denoted asα/2. In terms of this partitioning we have P = 1 − α.

From (5.5) and (5.6), we infer a statement about the probable location ofthe random variable X ,

P

µx − tP (n − 1)√

nSx ≤ X ≤ µx +

tP (n − 1)√n

Sx

= 1 − α .

From the point of view of the experimenter, it will, however, be more in-teresting to learn about the probable location of the parameter µx. Indeed,inverting the above statement yields

P

X − tP (n − 1)√

nSx ≤ µx ≤ X +

tP (n − 1)√n

Sx

= 1 − α . (5.7)

An equivalent statement, which reflects the experimental situation moreclosely, is obtained as follows: We “draw a sample” x1, x2, . . . , xn of sizen and determine the estimators x and s2

x. Then, as illustrated in Fig. 5.1, wemay rewrite (5.7) in the form

x − tP√n

sx ≤ µx ≤ x +tP√n

sx with probability P . (5.8)

5.2 Uncertainty of the Arithmetic Mean 63

This latter statement reveals the usefulness of Student’s density: it allows usto localize the unknown parameter µx by means of the two estimators x ands2

x. The normal density itself would not admit of such a statement, as theparameter σx is inaccessible.

The intervals (5.7) and (5.8) localize the parameter µx with probabilityP as defined in (5.6). The intervals themselves are called confidence intervalsand P is called the confidence level.

As has been pointed out, the difference xl− x cancels the systematic errorfx so that the expectation σ2

x = ES2x of the random variable S2

x is unbiased.Let us keep in mind that the parameter µx may differ from the true value

x0 by up to ±fs,x. In order to assess the uncertainty of the mean x withrespect to the true value x0, we therefore need to do better.

5.2 Uncertainty of the Arithmetic Mean

The expectation

µx = EX = x0 + fx , −fs,x ≤ f ≤ fs,x (5.9)

may deviate from x0 by up to ±fs,x , Fig. 5.2. Hence, as sketched in Fig. 5.3 ,we define the overall uncertainty ux of the estimator x as

ux =tP√n

sx + fs,x . (5.10)

The final measurement result reads

x ± ux . (5.11)

While (5.8) is a probability statement, (5.11) is not. Instead, we simply arguethat this statement localizes the true value x0 with, say, “factual” or “physi-cal” certainty. Though random errors, emerging from real experiments, mayas a rule be considered approximately normally distributed, in general, theextreme tails of the theoretical distribution model N(µx, σx) seem to be lack-ing. Relating tP to, say, P = 95%, this in turn means that we may assumethe interval (5.8) to localize µx quasi-safely.

The fact that the systematic error fx enters into (5.11) in the form of aworst case estimate might reflect an overestimation. On the other hand, theexperimenter does not have any means at his disposal to shape the uncer-tainty (5.10) more favorably, i.e. more narrowly. If he nevertheless compressedit, he would run the risk of forgoing the localization of the true value x0 sothat, as a consequence, all his efforts and metrological work would have beenfutile and, in particular, he might miss metrology’s strict demand for trace-ability.


Fig. 5.2. Difference between the expected value µx and the true value x0

Fig. 5.3. Uncertainty ux of the arithmetic mean x with respect to the true value x0

Example

Legal SI-based units are defined verbally – in this respect, they are faultless.Albeit their realizations are blurred as they suffer from experimental imper-fections. We consider the base units volt (V), ohm (Ω) and ampere (A). AsFig. 5.4 illustrates, the (present) realization of the SI unit ampere relies ontwo terms,

1 A = 1 AL + c .

Here 1 AL designates the laboratory ampere. This should coincide with thetheoretically defined SI ampere 1 A. However, as the former deviates from


Fig. 5.4. Realization of the SI unit ampere. The quantities 1A and 1AL symbolizethe SI ampere and the laboratory ampere respectively

the latter, a correction c needs to be applied. The correction is obtainedeither from other experiments or from a least squares fit to the fundamentalconstants of physics. Measured in laboratory units AL, the correction is givenin terms of an average c and an uncertainty uc so that

1A = 1AL + (c ± uc) = (1AL + c) ± uc .

On the other hand, the distinction between theoretically defined units andtheir experimentally realized counterparts may also be expressed in the formof quotients. To this end, so-called conversion factors have been introduced.The conversion factor for the ampere is defined by

KA =1AL

1A.

Let c denote the numerical value1 of c so that c = cAL. Then

KA =1

1 + c or, equivalently, c =1 − KA

KA.

The uncertainty uKA , given uc, will be derived in Sect. 5.3. Analogous re-lations refer to the SI units volt and ohm:

1V = 1VL + (a ± ua) = (1VL + a) ± ua ,

KV =1VL

1V=

11 + a , a =

1 − KV

KV,

1Ω = 1ΩL + (b ± ub) = (1ΩL + b) ± ub ,

KΩ =1ΩL

1Ω=

11 + b , b =

1 − KΩ

KΩ.

1e.g. the unit is suppressed.


For the realization of the units volt and ohm, Josephson’s and von Klitzing’squantum effects

UJ =h

2eν and RH =

h

ie2; i = 1, 2, 3, . . .

respectively, are of tremendous importance; h denotes Planck’s constant, ethe elementary charge and ν high-frequency radiation (≈ 70GHz). Unfortu-nately, the uncertainties of the leading factors h/(2e) and RH = h/e2 arestill unsatisfactory. Nevertheless, sophisticated predefinitions

2e/h = 483 597.9GHz/V (exact) ; h/e2 = 25.812 807KΩ (exact)

enable experimenters to exploit the remarkable reproducibilities of quantumeffects, i.e. to adjust the sums 1 VL + a and 1 ΩL + b ; the uncertainties ua

and ub remain, however, unaffected.We finally add the relationships between the constants a, b, c and the

conversion coefficients KV, KΩ, KA. From

1 V = 1 Ω 1 A and 1 VL = 1 ΩL 1 AL

we find

[1 + a] VL =[1 + b] ΩL [1 + c] AL

and, assuming a, b, c <≈ 10−6, by approximation

a = b + c .

Similarly, from

VL

KV=

ΩL

KΩ

AL

KA

we obtain

KV = KΩKA .

5.3 Uncertainty of a Function of One Variable

The simplest case of error propagation is the function of one variable, sayφ(x), e.g. φ(x) = lnx, φ(x) = 1/x, etc. In most cases, error propagationwithin functional relationships takes place on the basis of linearized Taylorseries expansions. Whether or not this is admissible depends on the mag-nitude of the measurement uncertainties of the input data, in this case ofthe uncertainty ux of the quantity x, and on the analytical behavior of theconsidered function in the neighborhood of x± ux. Assuming φ(x) to behavelinearly enough, the error equations

5.3 Uncertainty of a Function of One Variable 67

xl = x0 + (xl − µx) + fx , x = x0 + (x − µx) + fx

lead us to

φ(xl) = φ(x0) +dφ

dx

∣∣∣x=x0

(xl − µx) +dφ

dx

∣∣∣x=x0

fx + · · ·

φ(x) = φ(x0) +dφ

dx

∣∣∣x=x0

(x − µx) +dφ

dx

∣∣∣x=x0

fx + · · · (5.12)

so that2

φ(xl) − φ(x) =dφ

dx(xl − x) ; l = 1, 2, . . . , n . (5.13)

Here it is assumed that

– the expansions may be truncated after their linear terms and that– the derivative in x0 may be approximated by the derivative in x.

Furthermore, for the sake of lucidity, an equal sign has been introduced which,actually, is of course incorrect. Finally, we shall assume that

– by approximation, the derivatives may be treated as constants.

Within the framework of these approximations, the expected value of therandom variable φ(X),

µφ = Eφ(X)

= φ(x0) +

dφ

dxfx , (5.14)

presents itself as the sum of the true value φ(x0) and a bias expressing thepropagated systematic error

fφ =dφ

dxfx ; −fs,x ≤ fx ≤ fs,x . (5.15)

Obviously, the linearization (5.13) implies

φ = φ(x) =1n

n∑l=1

φ(xl) . (5.16)

Using (5.13), the empirical variance of the φ(xl) with respect to their meanφ(x) is given by

s2φ =

1n − 1

n∑l=1

[φ(xl) − φ(x)]2 =(

dφ

dx

)2

s2x . (5.17)

If the xl are normally distributed, this holds also for the φ(xl), l = 1, . . . , n.Hence, referring to (5.12), (5.14) and (5.17), we are in a position to define aStudent’s

2abbreviating the operation of differentiation.


T (n− 1) =φ(X) − µφ

Sφ/√

n

through

T (n− 1) =

dφ

dx(X − µx)∣∣∣∣dφ

dx

∣∣∣∣ Sx√n

= sign(

dφ

dx

)(X − µx)Sx/

√n

, (5.18)

where the sign function is meaningless on account of the symmetry of Stu-dent’s density. After all, (5.18) provides a confidence interval

φ(X) ± tP (n − 1)√n

Sφ (5.19)

for the parameter µφ with reference to a confidence level P . Combining (5.19)with the worst case estimate

fs,φ =∣∣∣∣dφ

dx

∣∣∣∣ fs,x ; −fs,φ ≤ fφ ≤ fs,φ (5.20)

of the propagated systematic error fφ, the overall uncertainty turns out to be

uφ =∣∣∣∣dφ

dx

∣∣∣∣[tP (n − 1)√

nsx + fs,x

]=∣∣∣∣dφ

dx

∣∣∣∣ ux , (5.21)

where ux is given in (5.10). Thus, the final measurement result is

φ(x) ± uφ . (5.22)

Examples

(i) Occasionally, the fine structure constant3

α = 7.29735308(33) 10−3

is inversely written

α−1 = 137.03598949 · · · .

We shall determine the uncertainty uα−1 , given uα = tP sα/√

n + fs,α .Linearizing φ(α) = 1/α,

φ(αl) − φ(α) = − 1α2

(αl − α) ; l = 1 , . . . , n

3The data are taken from [17]. Here, the rounding procedures proposed inSect. 1.2 do not apply. Instead, uncertainties are generally provided with two digits.

5.3 Uncertainty of a Function of One Variable 69

we find

s2α−1 =

1α4

s2α .

As

fα−1 = − 1α2

fα ,

the uncertainty in question is

uα−1 =1α2

[tP√n

sα + fs,α

]=

uα

α2

so that

α−1 = 137.0359895(62) .

We notice that the relative uncertainties are equal,

uα−1

α−1=

uα

α.

The same applies when ppm are introduced. As

uppm, α

106=

uα

α,

we have

uppm, α =uα

α106 =

uα−1

α−1106 = uppm, α−1 = 0.045 .

(ii) Given the uncertainty uc of the correction c to the base unit ampere,1 A = (1 AL+c)±uc, we shall find the uncertainty of the conversion factor KA.

Setting KA = y and c = x so that y = 1/(1 + x), we obtain

y − y0 =dy

dx0(x − x0) ; x − x0 = (x − µx) + fx

i.e.

y − y0 =dy

dx0[(x − µx) + fx] .

But then

uy =∣∣∣∣dy

dx

∣∣∣∣[

tP√n

sx + fs,x

]or

uKA=

uc(1 + c)2 .


Just to illustrate the situation, let us refer to the 1986 adjustment of thefundamental physical constants [17]. We have

KA = 0.99999397 ± 0.3 × 10−6 .

This means

1AL = KA 1A = 0.99999397 Ac = 6.03 × 10−6

uc = 0.30 × 10−6 A

c = 6.03 × 10−6 AL ,

so that

1AL + c = 0.99999397 + 0.00000603 = 1 A .

In Sect. 6.4 we shall assess the uncertainty uc, given ua and ub. Furthermore,given uV and uKΩ

, we shall quote the uncertainty uKA.

5.4 Systematic Errors of the Measurands

Should the measurand itself for any reason be blurred, a true value x0 wouldnot exist. Even in this case it would, however, be assumed that the measurandremains constant during the time it takes to perform n repeat measurements.The blurry structure of the measurand can be taken into account by anadditional systematic error

−fs,x0 ≤ fx0 ≤ fs,x0 . (5.23)

The linewidths on semiconductor templates, for example, may locally varymore or less severely. Subject to the resolution of the measuring process, suchvariations will be integrated out or disclosed. If integrated out, an additionalsystematic error should account for the inconstancies of the linewidths.

The extra systematic error fx0 of the measurand may be merged with thesystematic error of the measuring process

−fs,x ≤ fx ≤ fs,x

into an overall systematic error

−(fs,x + fs,x0), . . . , (fs,x + fs,x0) . (5.24)

It is essential that during the time it takes to register the repeat measurementsfx = const. and fx0 = const.

The formalization of series of measurements which are subject to time-varying systematic errors giving rise to so-called time series would be outsidethe scope of this presentation. A basic metrological example is given in [12].

6 Propagation of Measurement Errors

6.1 Taylor Series Expansion

We consider a functional relationship φ(x, y) and given results of measure-ment

x ± ux and y ± uy (6.1)

localizing true values x0 and y0 respectively. The latter define a quantityφ(x0, y0) which we regard as the true value of the physical quantity in ques-tion. The result to be found,

φ(x, y) ± uφ , (6.2)

should localize this quantity φ(x0, y0). In general, it is sufficient to base theassessment of the uncertainty uφ on the linear part of the Taylor expansion.Within the scope of this approximation and due to the error model reliedon, the error propagation resolves into two different, independent branchescovering the propagation of random and systematic errors.

Expanding φ(xl, yl); l = 1, 2, . . . , n throughout a neighborhood of thepoint x0, y0,

φ(xl, yl) = φ(x0, y0) +∂φ

∂x

∣∣∣∣∣∣∣x = x0y = y0

(xl − x0) +∂φ

∂y

∣∣∣∣∣∣∣x = x0y = y0

(yl − y0) + · · ·

and referring to the error equations

xl = x0 + (xl − µx) + fx, yl = y0 + (yl − µy) + fy (6.3)

yields1

φ(xl, yl) = φ(x0, y0) (6.4)

+[∂φ

∂x(xl − µx) +

∂φ

∂y(yl − µy)

]+[∂φ

∂xfx +

∂φ

∂yfy

].

1abbreviating the operation of differentiation.

72 6 Propagation of Measurement Errors

For the sake of lucidity, the truncated series expansion has been expressedin terms of an equality, which, of course, is incorrect. By approximation, weshall treat the differential quotients as constants – this will keep the formalismcontrollable. As we assume equal numbers of repeat measurements, we maysum over l and divide by n. Subtracting

φ(x, y) = φ(x0, y0) (6.5)

+[∂φ

∂x(x − µx) +

∂φ

∂y(y − µy)

]+[∂φ

∂xfx +

∂φ

∂yfy

]from (6.4) we find

φ(xl, yl) − φ(x, y) =∂φ

∂x(xl − x) +

∂φ

∂y(yl − y) (6.6)

The linearization implies

φ =1n

n∑l=1

φ(xl, yl) = φ(x, y) . (6.7)

From the expectation

µφ = Eφ(X, Y )

= φ(x0, y0) +

[∂φ

∂xfx +

∂φ

∂yfy

](6.8)

we seize:The unknown systematic errors fx and fy cause the expectation µφ of the

arithmetic mean φ(X, Y ) to be biased with respect to the true value φ(x0, y0).The bias is the propagated systematic error

fφ =∂φ

∂xfx +

∂φ

∂yfy ; −fs,x ≤ fx ≤ fs,x , −fs,y ≤ fy ≤ fs,y . (6.9)

The uncertainty uφ sought-for will be assessed by means of (6.6) and (6.9).Let us summarize. On account of the error model, the linear part of theTaylor expansion (6.5) splits up into two terms. The part due to randomerrors embodies the unknown expectations µx, µy. As a result, no direct as-sessment is possible. Meanwhile the difference (6.6) will help us to overcomethe difficulty. In contrast to this, the second term of (6.5) which expressesthe influence of systematic errors, turns out to be directly assessable.

Finally, for formal reasons, we define the vectors

b =

⎛⎜⎜⎝

∂φ

∂x

∂φ

∂y

⎞⎟⎟⎠ , ζ − µ =

⎛⎝ x − µx

y − µy

⎞⎠ (6.10)

so that (6.5) changes into

6.1 Taylor Series Expansion 73

φ(x, y) = µφ + bT(ζ − µ) . (6.11)

To set up the error propagation for more than two variables, we generalizethe nomenclature. We consider a functional relationship φ(x1, x2, . . . , xm)and arithmetic means

xi =1n

n∑l=1

xil ; i = 1, 2 , . . . , m (6.12)

to be based on an identical number n of repeat measurements.As usual, we designate the true values of the measurands xi; i =

1, 2, . . . , m by

x0,1, x0,2 , . . . , x0,m . (6.13)

Then, the final result

φ(x1, x2 , . . . , xm) ± uφ (6.14)

should localize the true value φ(x0,1, x0,2, . . . , x0,m). In analogy to (6.5), wehave

φ(x1, x2, . . . xm) = µφ +m∑

i=1

∂φ

∂xi(xi − µi) ; µi = EXi , (6.15)

where

µφ = Eφ(X1, X2, . . . , Xm)

= φ(x0,1, x0,2, . . . , x0,m) +

m∑i=1

∂φ

∂xifi . (6.16)

The contribution

fφ =m∑

i=1

∂φ

∂xifi ; −fs,i ≤ fi ≤ fs,i (6.17)

denotes the propagated systematic error biasing the true value φ(x0,1, x0,2,. . . , x0,m). Instead of (6.6) we here have, letting l = 1, 2, . . . , n,

φ(x1l, x2l, . . . xml) − φ(x1, x2, . . . xm) =m∑

i=1

∂φ

∂xi(xil − xi) . (6.18)

Finally, we convert (6.15) into vector form

φ(x1, x2, . . . xm) = µφ + bT(x − µ) , (6.19)


where

b =[

∂φ

∂x1· · · ∂φ

∂xm

]Tand

x − µ = (x1 − µ1 · · · xm − µm)T .

We are ultimately prepared to reformulate the formalism for error propaga-tion.

6.2 Expectation of the Arithmetic Mean

We might wish to assess the statistical fluctuations of the random errors from(6.4) and (6.8),

φ(xl, yl) = µφ +[∂φ

∂x(xl − µx) +

∂φ

∂y(yl − µy)

]; l = 1, 2 , . . . , n (6.20)

by means of the expectation

σ2φ = E

(φ(X, Y ) − µφ)2

=(

∂φ

∂x

)2

σ2x + 2

(∂φ

∂x

)(∂φ

∂y

)σxy +

(∂φ

∂x

)2

σ2y . (6.21)

However, neither the theoretical variances nor the theoretical covariance areavailable – the latter may turn out to be positive or negative. Resorting to(6.6), we can overcome the problem that we do not know the theoreticalmoments of second order, as

φ(xl, yl) − φ(x, y) =∂φ

∂x(xl − x) +

∂φ

∂y(yl − y) ; l = 1, 2 , . . . , n

puts us in a position to express the scattering of the random errors via theempirical moments,

s2x , sxy and s2

y

so that

s2φ =

1n − 1

n∑l=1

[φ(xl, yl) − φ(x, y)]2

yields

6.2 Expectation of the Arithmetic Mean 75

s2φ =

1n − 1

n∑l=1

[(∂φ

∂x

)2

(xl − x)2

+2(

∂φ

∂x

)(∂φ

∂y

)(xl − x)(yl − y) +

(∂φ

∂y

)2

(yl − y)2]

=(

∂φ

∂x

)2

s2x + 2

(∂φ

∂x

)(∂φ

∂y

)sxy +

(∂φ

∂y

)2

s2y . (6.22)

As the empirical variances s2x, s2

y and the empirical covariance sxy are unbi-ased, the random variable S2

φ is unbiased as well, ES2φ = σ2

φ.Succeeding pairs x1, y1; x2, y2; . . . ; xn, yn are independent. On the other

hand, with respect to the same l, xl and yl may well be dependent. When thepairs xl, yl are assigned to the random variables X, Y of a two-dimensionalnormal probability density, the quantities φ(xl, yl); l = 1, 2, . . . , n definedin (6.20) are normally distributed, be X and Y dependent or not. How-ever, succeeding pairs φ(xl, yl); φ(xl+1, yl+1) are independent, like pairs xl, yl;xl+1, yl+1. As s2

φ estimates σ2φ, we are in a position to define a

χ2(n − 1) =(n − 1)S2

φ

σ2φ

(6.23)

and a Student’s

T (n − 1) =φ(X, Y ) − µφ

Sφ/√

n. (6.24)

They are defined because we have agreed on statistically well-defined mea-suring conditions. Thus,

φ(X, Y ) − tP (n − 1)√n

Sφ ≤ µφ ≤ φ(X, Y ) +tP (n − 1)√

nSφ (6.25)

is a confidence interval for the expectation µφ of the arithmetic mean φ(X, Y ).The subscript P in tP specifies the associated level of confidence. Seen ex-perimentally we state:

The confidence interval

φ(x, y) ± tP (n − 1)√n

sφ (6.26)

localizes the expected value µφ = Eφ(X, Y ) of the random variable φ(X, Y )with probability PtP (n − 1).

Referring to the empirical variance–covariance matrix

s =

(sxx sxy

syx syy

); sxx ≡ s2

x, sxy = syx, syy ≡ s2y


and the vector b defined in (6.10), we may compress (6.22) into

s2φ = bTs b . (6.27)

The formalism does not require us to differentiate between dependent andindependent random variables. In the case of a dependence, the experimentaldevice watches out which realizations of X and Y fit together. In the caseof independence, the experimenter may realize any ordering.2 The numericalvalue of the empirical covariance sxy depends on the kind of pairing chosen.While all the values are equally well admissible, each of them is confined tothe interval

−sxsy ≤ sxy ≤ sxsy .

Even purposeful modifications of the data ordering would not invalidate(6.26).

The conventional error calculus, ignoring empirical covariances, restrictsitself to empirical variances and, dividing them by their respective numbersof repeat measurements, defines a quantity

s∗φ =

√(∂φ

∂x

)2s2

x

nx+(

∂φ

∂x

)2 s2y

ny

ad hoc. With reference to B.L. Welch’s effective degrees of freedom νeff , aconfidence region approximating Student’s interval (6.26) can still be given [7]

φ(x, y) − tP (νeff)s∗φ ≤ µφ ≤ φ(x, y) + tP (νeff)s∗φ .

For more than two variables, little is to be added. From (6.18) we derive theempirical variance

s2φ =

m∑i,j

∂φ

∂xi

∂φ

∂xjsij = bTs b , (6.28)

in which the vector b and the matrix s are given by

b =(

∂φ

∂x1

∂φ

∂x2· · · ∂φ

∂xm

)T

(6.29)

and

s =

⎛⎜⎜⎜⎝

s11 s12 . . . s1m

s21 s22 . . . s2m

. . . . . . . . . . . .

sm1 sm2 . . . smm

⎞⎟⎟⎟⎠

2For example, the two series (x1, y1), (x2, y2), . . . and (x1, y2), (x2, y1), . . . wouldbe equally admissible.


sij =1

n − 1

n∑l=1

(xil − xi)(xjl − xj) ; i, j = 1 , . . . , m (6.30)

respectively. If need be, we shall denote the diagonal elements of s as sii ≡ s2i .

Furthermore, for reasons of symmetry, we have sij = sji. Finally, Student’sconfidence interval for the parameter µφ which has been defined in (6.16)turns out to be

φ(X1, X2, . . . , Xm) ± tP (n − 1)√n

Sφ . (6.31)

To assess the influence of systematic errors, we shall have to refer to (6.9)and (6.17).

6.3 Uncertainty of the Arithmetic Mean

According to (6.8), the expectation µφ differs from the true value φ(x0, y0)by the propagated systematic error

fφ =∂φ

∂xfx +

∂φ

∂yfy ; −fs,x ≤ fx ≤ fs,x ; −fs,y ≤ fy ≤ fs,y , (6.32)

where neither the signs nor the magnitudes of fx and fy are known. Aswe wish to safely localize the true value φ(x0, y0), we opt for a worst caseestimation, setting

fs,φ =∣∣∣∣∂φ

∂x

∣∣∣∣ fs,x +∣∣∣∣∂φ

∂y

∣∣∣∣ fs,y (6.33)

and quote the result of measurement as

φ(x, y) ± uφ

uφ =tP (n − 1)√

n

√(∂φ

∂x

)2

s2x + 2

(∂φ

∂x

)(∂φ

∂y

)sxy +

(∂φ

∂y

)2

s2y

+∣∣∣∣∂φ

∂x

∣∣∣∣ fs,x +∣∣∣∣∂φ

∂y

∣∣∣∣ fs,y . (6.34)

We do not assign a probability to this statement. Considering m variables,we have

φ(x1, x2, . . . , xm) ± uφ

uφ =tP (n − 1)√

n

√√√√ m∑i,j

∂φ

∂xi

∂φ

∂xjsij +

m∑i=1

∣∣∣∣ ∂φ

∂xi

∣∣∣∣ fs,i , (6.35)


expecting the so-defined interval to localize the true value φ(x0,1, x0,2, . . .,x0,m).

The question as to whether worst case estimations of propagated system-atic errors might provoke unnecessarily large uncertainties or whether therelevant ISO Guide’s quadratic estimations might, in the end, lead to unre-liably small uncertainties, is still controversial. Here, worst case estimationsand a linear combination of random and systematic errors are in any casepreferred in contrast to the Guide which, postulating theoretical variances forsystematic errors, adds uncertainty components quadratically. The computersimulations in the next section will serve to scrutinize the differences.

There are useful supplements to (6.34) and (6.35). Remarkably enough,as −sxsy ≤ sxy ≤ sxsy, the term due to random errors can be reduced asfollows (

∂φ

∂x

)2

s2x + 2

(∂φ

∂x

)(∂φ

∂y

)sxy +

(∂φ

∂y

)2

s2y

≤(

∂φ

∂x

)2

s2x + 2

∣∣∣∣∂φ

∂x

∣∣∣∣∣∣∣∣∂φ

∂y

∣∣∣∣ sxsy +(

∂φ

∂y

)2

s2y

=(∣∣∣∣∂φ

∂x

∣∣∣∣ sx +∣∣∣∣∂φ

∂y

∣∣∣∣ sy

)2

(6.36)

so that (6.34) simplifies to

uφ =tP (n − 1)√

n

√(∂φ

∂x

)2

s2x + 2

(∂φ

∂x

)(∂φ

∂y

)sxy +

(∂φ

∂y

)2

s2y

+∣∣∣∣∂φ

∂x

∣∣∣∣ fs,x +∣∣∣∣∂φ

∂y

∣∣∣∣ fs,y

≤∣∣∣∣∂φ

∂x

∣∣∣∣(

tP (n − 1)√n

sx + fs,x

)+∣∣∣∣∂φ

∂y

∣∣∣∣(

tP (n − 1)√n

sy + fs,y

)

=∣∣∣∣∂φ

∂x

∣∣∣∣ ux +∣∣∣∣∂φ

∂y

∣∣∣∣ uy ,

i.e.

uφ ≤∣∣∣∣∂φ

∂x

∣∣∣∣ ux +∣∣∣∣∂φ

∂y

∣∣∣∣ uy . (6.37)

Obviously, it is of no significance that this expression is based upon a singularempirical variance–covariance matrix s.

A corresponding simplification is available for (6.35): considering two vari-ables at a time, the respective empirical covariance may be treated just as inthe case of two variables. But this leads to


uφ ≤m∑

i=1

∣∣∣∣ ∂φ

∂xi

∣∣∣∣uxi . (6.38)

In a sense, this approach conceals the empirical covariances between thevariables, saving us the necessity to calculate them afterwards from raw data,if need be. Evidently, just this latter property renders (6.38) particularlyattractive. With this, we finish the development of the new formalism oferror propagation. Its basic principles rely on

– stationarily operated non-drifting measuring devices,– decoupled random and systematic errors,– linearized Taylor series expansion,– well-defined measuring conditions,– the propagation of random errors via Student’s confidence intervals and

systematic errors by means of worst case estimations, and– the linear combination of the uncertainty components due to random and

systematic errors.

Fig. 6.1. Flow of random and systematic errors


Figure 6.1 presents a descriptive overview of the origin, the classification andthe flow of errors entering measurement uncertainties.

The errors of physical constants have to be treated according to the partic-ular role they play in the measuring process under consideration. If constants,akin to measurands, appear as arguments of functional relationships, theiruncertainty components should be broken up into components according towhether they are of random or systematic origin. If, on the other hand, theyare physically active parts of the measuring procedure applying for exam-ple to material measures, their uncertainties will exclusively be effective assystematic errors. An example will be given in Sect. 12.5.

In general, error propagation covers several stages. As will be shown inSect. 6.5, the error model under discussion steadily channels the flow of ran-dom and systematic errors.

6.4 Uncertainties of Elementary Functions

We shall illustrate the new formalism by means of some examples.

Sums and Differences

Given two measurands x and y, we combine the results

x ±(

tP (n − 1)√n

sx

)and y ±

(tP (n − 1)√

nsy

)(6.39)

to form functional relationships

φ(x, y) = x ± y . (6.40)

With respect to the sum, φ(x, y) = x + y, the error equations yield

φ(xl, yl) = x0 + y0 + (xl − µx) + (yl − µy) + fx + fy (6.41)

and

φ(x, y) = x0 + y0 + (x − µx) + (y − µy) + fx + fy (6.42)

respectively. Then, subtracting (6.41) and (6.42), we obtain

φ(xl, yl) − φ(x, y) = (xl − x) + (yl − y) . (6.43)

The expected value of (6.41),

µφ = Eφ(X, Y ) = x0 + y0 + fx + fy (6.44)

reveals a propagated systematic error

6.4 Uncertainties of Elementary Functions 81

fφ = fx + fy ; −fs,x ≤ fx ≤ fs,x ; −fs,y ≤ fx ≤ fs,y . (6.45)

Furthermore, from (6.43) we obtain an estimator

s2φ =

1n − 1

n∑l=1

[φ(xl, yl) − φ(x, y)]2 = s2x + 2sxy + s2

y (6.46)

for the theoretical variance σ2φ = E(φ(X, Y )−µφ)2 so that the final result is

φ(x, y) ± uφ

uφ =tP (n − 1)√

n

√s2

x + 2sxy + s2y + (fs,x + fs,y) . (6.47)

Now, in terms of a test of hypothesis, we ask for the compatibility of twoarithmetic means, x and y aiming at one and the same physical quantity, i.e.we are looking for a maximum tolerable deviation |x − y|. Setting

φ(xl, yl) − φ(x, y) = (xl − x) − (yl − y) (6.48)

we find

s2φ = s2

x − 2sxy + s2y (6.49)

and, as

fφ = fx − fy ; −fs,x ≤ fx ≤ fs,x ; −fs,y ≤ fx ≤ fs,y , (6.50)

these two means are compatible if

|x − y| ≤ tp(n − 1)√n

√s2

x − 2sxy + s2y + (fs,x + fs,y) . (6.51)

Condoning a singular empirical variance–covariance matrix, a coarser esti-mation would be

|x − y| ≤ ux + uy . (6.52)

Within the scope of so-called round robins, Sect. 12.1, one and the same mea-suring object is circulated amongst various laboratories. Each of the partici-pants measures the same physical quantity. Assuming this quantity remainsconstant during the whole period of circulation, then, any two uncertaintiestaken at a time should mutually overlap.

Example

We estimate the uncertainty uc of the correction c to the ampere, given theuncertainties ua and ub of the corrections a and b to the volt and the ohm.


In Sect. 5.2 we had 1 V = (1 + a)VL; 1 Ω = (1 + b)ΩL; 1 A = (1 +c)AL. From

c = a − bwe deduce

uc =tP (n − 1)√

n

√s2a − 2sab + s2

b + (fs,a + fs,b) .

Products and Quotients

Linearizing the product

φ(x, y) = xy (6.53)

yields

φ(xl, yl) − φ(x, y) = y(xl − x) + x(yl − y) ; l = 1, 2 , . . . , n

so that

s2φ = y2s2

x + 2 x y sxy + x2s2y . (6.54)

As

fφ = yfx + xfy , (6.55)

we have

fs,φ = |y|fs,x + |x|fs,y (6.56)

so that the true value φ(x0, y0) = x0y0 should lie within

φ(x, y) ± uφ

uφ =tP (n − 1)√

n

√y 2s2

x + 2xysxy + x 2s2y + (|y|fs,x + |x|fs,y) . (6.57)

Linearizing the quotient

φ(x, y) =x

y(6.58)

leads to

φ(xl, yl) − φ(x, y) =1y(xl − x) − x

y2(yl − y) ; l = 1, 2 , . . . , n .

Thus, the empirical variance follows from


s2φ =

1y 2

s2x − 2

x

y 3sxy +

x2

y 4s2

y . (6.59)

Considering the propagated systematic error

fφ =1yfx +

x

y2fy , (6.60)

we have

φ(x, y) ± uφ (6.61)

uφ =tP (n − 1)√

n

√1y 2

s2x − 2

x

y 3sxy +

x 2

y 4s2

y +(

1|y|fs,x +

|x||y2|fs,y

).

Given that X and Y are taken from standardized normal distributions, therandom variable U = X/Y follows a Cauchy density. Then, as is known,EU and EU2 would not exist. However, we confine ourselves to a non-pathological behavior of the quotient X/Y , so that the φ(xl, yl); l = 1, . . . , nas defined by the truncated Taylor series, may be considered to be (approxi-mately) normally distributed.

Examples

We estimate the uncertainty of the conversion factor KA, given the conversionfactors KV ± uKV and KΩ ± uKΩ . From

KA =KV

KΩ

we obtain

uKA =tP (n − 1)√

n

√1

K2Ω

s2KV

− 2KV

K3Ω

sKV KΩ +K2

V

K4Ω

s2KΩ

+(

1KΩ

fs,KV +KV

K2Ω

fs,KΩ

).

We consider the Michelson–Morley experiment and estimate the uncertaintyuV of the displacement

V = 2L(u

c

)2

that the interference fringes should exhibit after 90 rotation of the inter-ferometer. The quantity u denotes the Earth’s velocity relative to the pos-tulated ether, c the velocity of light and L the distance between any of thetwo all-silvered mirrors to the central half-silvered mirror. As u/c ≈ 10−4,


we observe V ≈ 2 10−8L. Setting e.g. L = 2 × 107 λ the displacement shouldbe V ≈ 0.4 λ. Quite contrary to today’s concepts, we naively consider c ameasurand. Thus

uV =tP (n − 1)√

n

×√(

V

L

)2

s2L +(

2V

u

)2

s2u +(

2V

c

)2

s2c +

2V 2

(Lu)sLu − 4V 2

(uc)suc − 2V 2

(Lc)sLc

+

[(V

L

)s,L

+(

2V

u

)fs,u +

(2V

c

)fs,c

].

Though Michelson and Morley relied on a different approach to estimate theuncertainty of the displacement V , their result will endure.3 This might will-ingly be taken as an opportunity to play down the importance of measurementuncertainties in general. Nevertheless, we are sure that scientific discoveriesrely on vast fields of a priori knowledge, i.e. sound theoretical frameworksand decades of metrological experience as well as the understanding that anyscientific concept has to be subjected to an all-decisive comparison betweentheory and experiment so that in the last analysis, it is the measurementuncertainty which is responsible for the result.

Comparisons will be particularly critical if experiments are operating closeto the limits of verifiability and if their outgoings have far-reaching physicalconsequences – which applies, for example, to the still pending question ofthe correct neutrino model.

Simulation

As the true values of the measurands are unknown, there is no way to test aformalism estimating measurement uncertainties which relies on real, phys-ically obtained data. Nevertheless, we are in a position to design numericaltests being based on simulated data, since under conditions of simulation, thetrue values are necessarily part of the model, and thus known a priori. Wesubject both formalisms, the one proposed here and that of the ISO Guide,to simulative tests. We “measure” the area of a rectangle whose side lengthsare chosen to be x0 = 1 cm and y0 = 2cm, Fig. 6.2. In each case, we con-sider n = 10 repeat measurements for x and y respectively. Let the intervalslimiting the systematic errors be

−10 µm ≤ fx ≤ 10 µm , −10 µm ≤ fy ≤ 10 µm .

3The interpretation of the observation is as follows. If the arm of the interferom-eter lying parallel to the direction of motion of the Earth is shortened by a factorof [1 − (u/c)2], the displacement V vanishes numerically.


Fig. 6.2. Simulation of the measurement of the area of a rectangle

We adjust the variances σ2x, σ2

y of the simulated, normally distributed randomerrors to be equal to the postulated ISO variances of the systematic errorsfx, fy,

σ2fx

=f2

s,x

3and σ2

fy=

f2s,y

3,

referring to rectangular densities over the two intervals given above.According to the error model under discussion, the uncertainty uxy of the

area xy is given by

uxy =tP (n − 1)√

n

√y 2s2

x + 2 x y sxy + x 2s2y + |y|fs,x + |x|fs,y (6.62)

while the ISO Guide [36] would yield

uxy = kP

√y 2

[s2

x

nx+

f2s,x

3

]+ x 2

[s2

y

ny+

f2s,y

3

]. (6.63)

The error limits fs,x = 10 µm and fs,y = 10 µm enter (6.62) and (6.63).But, to simulate the “measuring data”, we choose any values fx, fy from thepertaining intervals ±fs,x, ±fs,y and superpose them on the simulated data.Let us consider some special cases. If fx and fy

– have different signs, we shall speak of error compensation,– have the same sign, of error intensification,– are permuted, of a test for symmetry.


As to ISO uncertainties, kP = 1 is recommended for purposes of high-levelmetrology and kP = 2 for standard cases. Within the alternative formalismwe set t68%(9) = 1.04 and t95%(9) = 2.26, thus allowing meaningful compar-isons.

Every “measurement” yields a result xy ± uxy. However, to arrive atstatistically representative statements, we carry out 1000 complete cyclesof measurements at a time, i.e. for any given choice of fx = const. andfy = const., we simulate 1000 new sets of random errors.

If kP = 1, in the case of error intensification, the ISO model turns out tobe unfit. Remarkably enough, the test for symmetry delivers only 59 positiveresults or successes, given that fx = 10 µm and fy = 0 µm. However, iffx = 0 µm and fy = 10 µm, we observe 841 successes. This observation is dueto the disparity of the partial derivatives y ≈ 2 cm and x ≈ 1 cm which, in asense, “weight” the errors.

The situation gets better on setting kP = 2. This does not, however, applyto the most unfavorable case fx = fy = 10 µm.

Clearly, as Tables 6.1 and 6.2 illustrate, occasionally used terms like “opti-mistically” or “pessimistically” estimated measurement uncertainties do nothave any specific meaning. We neither give information away when quotingsafe estimates nor regain information otherwise lost by keeping the measure-ment uncertainties tight. Measurement uncertainties should rather be simplyobjective.

Table 6.1. Localization of true values: ISO Guide versus alternative approach

1000 Simulations kp = 1 t68%(9) = 1.04

in µm ISO alternative model

fx fy successes failures successes failures

compensation

10 −10 815 185 1000 0

5 −5 986 14 1000 0

intensification

10 10 0 1000 842 158

5 5 363 637 1000 0

symmetry

10 0 59 941 1000 0

0 10 814 186 1000 0

5 0 796 204 1000 0

0 5 978 22 1000 0

6.5 Error Propagation over Several Stages 87

6.5 Error Propagation over Several Stages

Even in experimentally simple situations, quantities to be linked up maypresent themselves as results of previous measurements and other func-tional relationships, so φ(x, y) = x/y could be a first and Γ[y, φ(x, y)] =y exp[φ(x, y)] a second combination. We consider the uncertainty uΓ, refer-ring again to truncated series expansions and the notation Γ = Γ[y, φ(x, y)].

Let us initially linearize φ(x, y),

φ(xl, yl) = φ(x0, y0) +[∂φ

∂x(xl − µx) +

∂φ

∂y(yl − µy)

]+[∂φ

∂xfx +

∂φ

∂yfy

],

φ(x, y) = φ(x0, y0) +[∂φ

∂x(x − µx) +

∂φ

∂y(y − µy)

]+[∂φ

∂xfx +

∂φ

∂yfy

].

The expectation

µφ = φ(x0, y0) +∂φ

∂xfx +

∂φ

∂yfy (6.64)

reveals a propagated systematic error of the kind

fφ =∂φ

∂xfx +

∂φ

∂yfy . (6.65)

We might wish to estimate fφ as

Table 6.2. Localization of true values: ISO Guide versus alternative approach

1000 Simulations kp = 2 t95%(9) = 2.26

in µm ISO alternative model

fx fy successes failures successes failures

compensation

10 −10 1000 0 1000 0

5 −5 1000 0 1000 0

intensification

10 10 232 768 974 26

5 5 999 1 1000 0

symmetry

10 0 996 34 1000 0

0 10 1000 0 1000 0

5 0 1000 0 1000 0

0 5 1000 0 1000 0


−fs,φ ≤ fφ ≤ fs,φ , fs,φ =∣∣∣∣∂φ

∂x

∣∣∣∣ fs,x +∣∣∣∣∂φ

∂y

∣∣∣∣ fs,y (6.66)

which, however, should be postponed. Let us first refer to

φ(xl, yl) − φ(x, y) =[∂φ

∂x(xl − x) +

∂φ

∂y(yl − y)

]. (6.67)

Now, we expand Γ[y, φ(x, y)] twofold, abbreviating φl = φ(xl, yl) and φ0 =φ(x0, y0),

Γ(yl, φl) = Γ(y0, φ0) +[∂Γ∂y

(yl − µx) +∂Γ∂φ

(φl − µφ)]

+[∂Γ∂y

fy +∂Γ∂φ

fφ

],

Γ(y, φ) = Γ(y0, φ0) +[∂Γ∂y

(y − µx) +∂Γ∂φ

(φ − µφ)]

+[∂Γ∂y

fy +∂Γ∂φ

fφ

].

This leads us to

Γ(yl, φl) − Γ(y, φ) =[∂Γ∂y

(yl − y) +∂Γ∂φ

(φl − φ)]

(6.68)

and

fΓ =[∂Γ∂y

fy +∂Γ∂φ

fφ

](6.69)

respectively. Finally, we insert (6.67) into (6.68),

Γ(yl, φl) − Γ(y, φ) =[∂Γ∂φ

∂φ

∂x(xl − x) +

(∂Γ∂y

+∂Γ∂φ

∂φ

∂y

)(yl − y)

](6.70)

and (6.65) into (6.69),

fΓ =∂Γ∂φ

∂φ

∂xfx +

[∂Γ∂y

+∂Γ∂φ

∂φ

∂y

]fy . (6.71)

It is only now that we should estimate the propagated systematic error,

−fs,Γ ≤ fΓ ≤ fs,Γ , fs,Γ =∣∣∣∣∂Γ∂φ

∂φ

∂x

∣∣∣∣ fs,x +∣∣∣∣∂Γ∂y

+∂Γ∂φ

∂φ

∂y

∣∣∣∣ fs,y . (6.72)

If, instead, we had subjected (6.69) to a worst case estimation

f∗s,Γ =

∣∣∣∣∂Γ∂y

∣∣∣∣ fs,y +∣∣∣∣∂Γ∂φ

∣∣∣∣ fs,φ

and had done the same with (6.65), we would have got

6.5 Error Propagation over Several Stages 89

f∗s,Γ =

∣∣∣∣∂Γ∂φ

∂φ

∂x

∣∣∣∣ fs,x +[∣∣∣∣∂Γ

∂y

∣∣∣∣+∣∣∣∣∂Γ∂φ

∂φ

∂y

∣∣∣∣]

fs,y .

Obviously, this would be an inconsistent worst case estimation conflictingwith the concepts of error propagation provided by the alternative errormodel. In other words, we had overlooked a dependency and, as a conse-quence, estimated too large a systematic error.

Referring to the abbreviations

bx =∂Γ∂φ

∂φ

∂xand by =

∂Γ∂y

+∂Γ∂φ

∂φ

∂y

(6.70) and (6.71) yield

s2Γ = b2

xs2x + 2bxbysxy + b2

ys2y (6.73)

and

fΓ = bxfx + byfy (6.74)

respectively, so that

Γ(y, φ) ± uΓ (6.75)

uΓ =tP (n − 1)√

n

√b2xs2

x + 2bxbysxy + b2ys2

y + [|bx|fs,x + |by|fs,y] .

Just to prove the efficiency of this approach, we consider another relationship,namely

Γ [φ(x, y , . . .) , ψ(p, q , . . .) , . . .] .

Confining ourselves to two measurands, from

Γ(φl, ψl) = Γ(φ0, ψ0) +∂Γ∂φ

(φl − µφ) +∂Γ∂ψ

(ψl − µψ) +∂Γ∂φ

fφ +∂Γ∂ψ

fψ

Γ(φ, ψ) = Γ(φ0, ψ0) +∂Γ∂φ

(φ − µφ) +∂Γ∂ψ

(ψl − µψ) +∂Γ∂φ

fφ +∂Γ∂ψ

fψ

we deduce

Γ(φl, ψl) − Γ(φ, ψ) =∂Γ∂φ

(φl − φ) +∂Γ∂ψ

(ψl − ψ) ,

and, furthermore,

fΓ =∂Γ∂φ

fφ +∂Γ∂ψ

fψ .


With

φl − φ =[∂φ

∂x(xl − x) +

∂φ

∂y(yl − y)

]; fφ =

∂φ

∂xfx +

∂φ

∂yfy

ψl − ψ =[∂ψ

∂p(pl − p) +

∂ψ

∂q(ql − q)

]; fψ =

∂ψ

∂pfp +

∂φ

∂qfq

and

bx =∂Γ∂φ

∂φ

∂x, by =

∂Γ∂φ

∂φ

∂y, bp =

∂Γ∂ψ

∂ψ

∂p, bq =

∂Γ∂ψ

∂ψ

∂q

we find

Γ(φl, ψl) − Γ(φ, ψ) = bx(xl − x) + by(yl − y) + bp(pl − p) + bq(ql − q)

and

fΓ = bxfx + byfy + bpfp + bqfq

respectively. Obviously, that part of the overall uncertainty uΓ which is dueto random errors includes the empirical covariances sxy, sxp, sxq, syp, syq, spq,and this in turn means that the evaluation procedures rely on the originalseries of measurements, which, hopefully, will have been saved by the labo-ratories.

Part II

Least Squares Adjustment

7 Least Squares Formalism

7.1 Geometry of Adjustment

In a sense, the least squares adjustment of an inconsistent, overdeterminedlinear system relies on the idea of orthogonal projection. To illustrate this,let us consider a source emitting parallel light falling perpendicularly onto ascreen. A rod held obliquely over the screen then casts a shadow, Fig. 7.1. Thisscenario may be considered to be an equivalent to the orthogonal projectionof the method of least squares.

Let us develop this idea to solve the following problem. Given n repeatmeasurements

x1, x2, . . . , xn ,

how can we find an estimator β for the unknown true value x0? Referring ntimes to the error equations (2.3), we are faced with

x1 = x0 + ε1 + fx2 = x0 + ε2 + f· · · · · · · · · · · · · · · · · · · · ·xn = x0 + εn + f .

(7.1)

By means of an auxiliary vector

Fig. 7.1. The method of least squares, presented as an orthogonal projection of arod, held obliquely over a screen

94 7 Least Squares Formalism

Fig. 7.2. Vector x of input data, vector r of residuals and true solution vector x0

a = (1 1 · · · 1)T ,

we may formally define a true solution vector x0 = ax0. Putting

x = (x1 x2 · · · xn)T and ε = (ε1 ε2 · · · εn)T

(7.1) takes the form

x = x0 + ε + f .

As β is unknown, we introduce a variable β instead:

β ≈ x1 , β ≈ x2 , . . . , β ≈ xn ,

or, in vector form,⎛⎜⎜⎝

11· · ·1

⎞⎟⎟⎠β ≈

⎛⎜⎜⎝

x1

x2

· · ·xn

⎞⎟⎟⎠ , i.e. aβ ≈ x . (7.2)

If x had no errors, we would have aβ0 = x0, i.e. β0 = x0. In a sense, we mayunderstand the vector a to span a one-dimensional space holding the truesolution vector x0.

As there are measurement errors, we purposefully quantify the discrep-ancy between x and x0 by the vector

r = x − x0 ,

the components of which are called residuals, Fig. 7.2. As the vector space aholds the true solution vector x0, we assume that just this space will holdthe approximation to (7.2) we are looking for. Indeed, the method of leastsquares “solves” the inconsistent system (7.2) by orthogonally projecting xonto a||x0. With P denoting the appropriate projection operator, we have

aβ = Px . (7.3)

7.1 Geometry of Adjustment 95

Fig. 7.3. Orthogonal projection P x of x onto a; r vector of residuals of minimallength; β stretch factor

The stretch factor β by which a has to be multiplied so that (7.3) is fulfilled isconsidered to be the least squares estimator of the inconsistent system (7.2).Figure 7.3 illustrates the proposal. We prove:

The orthogonal projection of x onto a minimizes the Euclidean normof r.

To this end, an arbitrary vector of residuals r = x − aβ is likened withthe vector of residuals being due to the orthogonal projection,

r = x − aβ .

The identity

r = x − aβ = (x − aβ) + a(β − β)

yields

rTr = rTr + aTa(β − β)2 ,

i.e.

rTr ≥ rTr .

Usually, this result is established by differentiating an expression of the form

Q = (x1 − β)2 + (x2 − β)2 + · · · + (xn − β)2

with respect to β. This sum, called the sum of squared residuals, is obviouslyestablished by squaring and summing up the components of the vector r =x − aβ. However, here we prefer to trace the method of least squares backto the idea of orthogonal projections. From

aTr = aT(x − aβ) = 0 , i.e. aTa β = aTx (7.4)


we infer

β = (aTa)−1aTx . (7.5)

Hence, in this introductory example, the required least squares estimatoris nothing but the arithmetic mean

β =1n

n∑l=1

xl

of n observations xl; l = 1, . . . , n. Let us summarize: a and x are vectors ofan n-dimensional space. The orthogonal projection of x, presented in (7.3),onto a one-dimensional space, spanned by a, has produced the stretch fac-tor or estimator β = x. Thus, the orthogonal projection has “solved” theinconsistent system (7.2) in terms of least squares.

To find the projection operator P , we multiply aTaβ = aTx on the leftby a(aTa)−1,

a β = a(aTa)−1aTx ,

so that

P = a(aTa)−1aT . (7.6)

In this particular case, all elements of P exhibit the same value 1/n. Multi-plying (7.3) on the left by aT yields

aTaβ = aTPx = aTx , i.e. β = (aTa)−1aTx .

The method of least squares cannot do wonders: all it can do is to transferthe approximation

aβ ≈ x ,

say by means of a gimmick, into a formally self-consistent statement, namely

aβ = Px .

The input data need not fulfil any presumptions at all. The orthogonal projec-tion “adjusts” any kind of contradiction in like manner, so that fatal errors ofreasoning and the finest, metrologically unavoidable, errors of measurementare treated side by side on the same level.

The vector of residuals, r = x−aβ, is in no way related to physical errors.As r is due to an orthogonal projection, its components are mathematicallydefined auxiliary quantities. However, the components of the vector r =x − x0 are physically meaningful error sums.

7.2 Unconstrained Adjustment 97

7.2 Unconstrained Adjustment

Let us assume that the measuring problem has led us to a set of linearrelationships

a11β1 + a12β2 + · · · + a1rβr = x1

a21β1 + a22β2 + · · · + a2rβr = x2

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·am1β1 + am2β2 + · · · + amrβr = xm .

(7.7)

To cast this in matrix notation, we introduce an (m × r) matrix A of coeffi-cients aik, an (m × 1) column vector x, assembling the right-hand sides xi,and finally, an (r × 1) column vector β of unknowns βk. From

A =

⎛⎜⎜⎝

a11 a12 . . . a1r

a21 a22 . . . a2r

. . . . . . . . . . . .am1 am2 . . . amr

⎞⎟⎟⎠ , x =

⎛⎜⎜⎝

x1

x2

. . .xm

⎞⎟⎟⎠ , β =

⎛⎜⎜⎝

β1

β2

. . .βr

⎞⎟⎟⎠

we find

Aβ = x .

This system is solvable if and only if the vector x may be expressed as alinear combination of the column vectors of A,

ak =

⎛⎜⎜⎝

a1k

a2k

. . .amk

⎞⎟⎟⎠ ; k = 1, . . . , r ,

so that

β1

⎛⎜⎜⎝

a11

a21

. . .am1

⎞⎟⎟⎠+ β2

⎛⎜⎜⎝

a12

a22

. . .am2

⎞⎟⎟⎠+ · · · + βr

⎛⎜⎜⎝

a1r

a2r

. . .amr

⎞⎟⎟⎠ = x .

If this applies, x is inside the column space of A; if not, the linear system isinconsistent or algebraically unsolvable.

Any experimental implementation of relationships (7.7), being conceptu-ally exact, entails measurement errors so that

Aβ ≈ x (7.8)

which means that x is outside the column space of A. To quantify the dis-crepancy, we again refer to a vector r of residuals,


Fig. 7.4. The vector r of residuals is perpendicular to the r-dimensional subspace,spanned by the column vectors ak ; k = 1, . . . , r

r = x − Aβ .

As has been discussed, we require the solution vector β to minimize the sumof squared residuals.

In the following, we assume

m > r and rank(A) = r

The vectors ak; k = 1, . . . , r define an r-dimensional subspace of an m-dimensional space. Of course, x is an element of the latter, however, onaccount of measurement errors, not an element of the former. By means ofa projection operator P the method of least squares throws x orthogonallyonto the subspace spanned by the r column vectors ak and substitutes theprojection Px for the erroneous vector x so that, as shown in Fig. 7.4,

Aβ = Px . (7.9)

Obviously, the vector Px so defined is a linear combination of the columnvectors ak. Hence, (7.9) is self-consistent and solvable. Its solution vector βsolves (7.8) in terms of least squares. The components βk; k = 1, . . . , r of βmay be seen to stretch the column vectors ak of A, rendering (7.9) possible.

Due to the orthogonality of the projection of x, the vector of residuals

r = x − Aβ (7.10)

is perpendicular to each of the column vectors of A,

ATr = 0 .

7.2 Unconstrained Adjustment 99

Inserting (7.10), we obtain

ATAβ = ATx ,

so that

β = BTx , (7.11)

where, to abbreviate,

B = A(ATA)−1 (7.12)

has been set. The definition of projection operators is independent of themethod of least squares, see Appendix D. For now we are content with aheuristic derivation of P . Multiplying ATAβ = ATx on the left by B yieldsAβ = A(ATA)−1ATx, i.e.

P = A(ATA)−1AT . (7.13)

By means of the identity

r = x − Aβ = (x − Aβ) + A(β − β)

we again prove that the Euclidean norm

rTr = rTr + (β − β)TATA(β − β)

of any vector of residuals r = x−Aβ can never be smaller than the Euclideannorm of the vector of residuals r resulting from the orthogonal projection, i.e.

rTr ≥ rTr .

The product ATA is invertible if the column vectors of the design matrix Aare independent, i.e. if A has rank r, see Appendix A.

Uncertainty assignments ask for well-defined reference values. Conse-quently, for any empirical input vectors x obtained metrologically, we have tointroduce a true vector x0 the components x0,i of which are the true valuesof the measurands xi,

x0 = (x0,1 x0,2 · · · x0,m)T .

If x0 was known, r equations would be sufficient to determine the r un-known components β0,k of the true solution vector β0. But assuming thelinear system to be both physically and mathematically well defined, we maynevertheless deduce β0 referring to all m > r equations,

β0 = (ATA)−1ATx0 .

As A is rectangular, an inverse is not defined. However, we may take(ATA)−1AT as a pseudo-inverse.


7.3 Constrained Adjustment

We demand the least-squares solution vector β of the overdetermined, incon-sistent linear system

Aβ ≈ x (7.14)

not only to reconcile the linear system with the measurement errors of the in-put data, but to also consistently comply with a given set of linear constraintsof the form

Hβ = y , (7.15)

where H is an (q×r) matrix of rank q. The least squares vector β is requiredto satisfy Hβ = y. The problem solving procedure depends on the rank ofthe system matrix A. Initially, we assume

rank(A) = r .

Certainly, the projection operator P = A(ATA)−1AT used until now wouldproduce a vector β that is incompatible with the constraints. Consequently,we have to modify the quiddity of the orthogonal projection. Indeed, as willbe shown in Appendix E, a projection operator of the form

P = A(ATA)−1AT − C(CTC)−1CT

meets the purpose. The least squares solution vector is

β = (ATA)−1ATx − (ATA)−1HT[H(ATA)−1HT

]−1

× [H(ATA)−1ATx − y]

(7.16)

fulfilling exactly Hβ = y. Now consider

rank(A) = r′ < r .

We add q = r − r′ constraints

Hβ = y , (7.17)

where rank(H) = q, and combine the two systems (7.14) and (7.17). Thoughthe product ATA is not invertible, the sum of products ATA + HTH is.The least squares solution vector turns out as

β = (ATA + HTH)−1(ATx + HTy) , (7.18)

satisfying Hβ = y.

8 Consequences of Systematic Errors

8.1 Structure of the Solution Vector

In the following we shall confine ourselves to unconstrained adjustments, i.e.to linear systems of the kind (7.8),

Aβ ≈ x . (8.1)

With respect to the assessment of uncertainties, constrained adjustmentswould not introduce new aspects. As a matter of principle, any vector x ofinput data could be subjected to a least squares adjustment, given the matrixA has full rank and there is a true vector x0 underlying the erroneous inputdata, i.e. if alongside (8.1) there is a consistent system of the form

Aβ0 = x0 .

While the geometry of the least squares adjustment distinguishes the errorsof the input data neither with respect to their causes nor with respect to theirproperties, as it is but an orthogonal projection, the situation gets differentwhen we intend to estimate measurement uncertainties.

The column vector

x = (x1 x2 · · · xm)T

contains m erroneous input data

xi = x0,i + (xi − µi) + fi ; i = 1, . . . , m . (8.2)

The x0,i denote the true values of the observations xi, the differences (xi − µi)their random errors εi and, finally, the fi systematic errors which we considerto be constant in time. Further, we assume the εi to be independent andnormally distributed. At present we shall abstain from repeat measurements.

The expected values of the random variables Xi; i = 1, . . . , m are given by

µi = E Xi = x0,i + fi ; i = 1, . . . , m . (8.3)

Converting (8.2) into vector form yields

102 8 Consequences of Systematic Errors

Fig. 8.1. The least squares solution vector β is given by the sum of the truesolution vector β0 and the vectors BT(x −µ) and BTf being due to random andsystematic errors respectively

x =

⎛⎜⎜⎝

x1

x2

. . .xm

⎞⎟⎟⎠ =

⎛⎜⎜⎝

x0,1

x0,2

. . .x0,m

⎞⎟⎟⎠+

⎛⎜⎜⎝

x1 − µ1

x2 − µ2

. . .xm − µm

⎞⎟⎟⎠+

⎛⎜⎜⎝

f1

f2

. . .fm

⎞⎟⎟⎠

or

x = x0 + (x − µ) + f . (8.4)

We now decompose the least squares solution vector

β =(ATA

)−1ATx = BTx ; B = A

(ATA

)−1(8.5)

into

β = BT [x0 + (x − µ) + f ] = β0 + BT (x − µ) + BTf . (8.6)

Obviously,

β0 =(ATA

)−1ATx0 = BTx0 (8.7)

designates the true solution vector. As (8.6) and Fig. 8.1 underscore, the leastsquares estimator β is given by the sum of the true solution vector β0 and twofurther terms BT (x − µ) and BTf expressing the propagation of randomand systematic errors respectively.

Within the framework of the concept of a statistical ensemble of measuringdevices biased by one and the same systematic error, the estimator β scatterswith respect to the expectation

µβ = Eβ

= β0 + BTf (8.8)

and not with respect to the true solution vector β0. On this note, we propose:Given the input data are charged by unknown systematic errors, the least

squares solution vector β is biased with respect to the true solution vector β0.Just this bias rules out most of the classical least squares tools.

8.2 Minimized Sum of Squared Residuals 103

8.2 Minimized Sum of Squared Residuals

Systematic errors provide the minimized sum of squared residuals with newproperties.

For simplicity, we assume the random errors εi = xi − µi to relate to thesame parent distribution. We then have

σ2 = E(Xi − µi)

2

; i = 1, . . . , m . (8.9)

If the input data were free from systematic errors, the unweighted minimizedsum of squared residuals

Qunweighted = rTr =(x − Aβ

)T (x − Aβ

)(8.10)

would supply an estimator for (8.9), namely

s2 =Qunweighted

m − r≈ σ2 . (8.11)

Considering (8.4), (7.10) and (7.13),

r = x − Aβ = x − Px

= x0 + (x − µ) + f − P [x0 + (x − µ) + f ]= (x − µ) − P (x − µ) + (f − Pf) , (8.12)

and temporarily ignoring the last term (f − Pf ) on the right-hand side, wefind

Qunweighted = (x − µ)T (I − P ) (x − µ) .

Here I denotes an (m × m) identity matrix. Dividing by σ2 produces theweighted sum of squared residuals

Qweighted =(x − µ)T

σ(I − P )

(x − µ)σ

. (8.13)

We shall show that Qweighted is χ2-distributed with m−r degrees of freedom.The matrices I, P and I − P are projection matrices, see Appendix D.

As rank(I − P ) = m − r, the difference matrix I − P has m − r non-zeroeigenvalues, all having the same value 1. The remaining r eigenvalues arezero.

Let T denote an orthogonal (m × m) matrix and I an (m × m) identitymatrix so that T TT = T T T = I, T T = T−1. Following the spectral theoremof algebra [52], we put the orthogonal eigenvectors of I −P into the columnsof T so that


T T(I − P )T =

⎛⎜⎜⎜⎜⎜⎜⎝

1 0 0 . . . 0 |0 1 0 . . . 0 |

. . . . . . . . . . . . . . . | 00 0 0 . . . 1 |

− − − −−− −−− −−− −−− | − −−0 | 0

⎞⎟⎟⎟⎟⎟⎟⎠

m − r

r

m − r r (8.14)

Then, introducing an auxiliary vector

y = (y1 y2 . . . ym)T

and putting

x − µ

σ= Ty ,

(8.13) changes into

Qweighted = (T y)T (I − P )T y =m−r∑i=1

y2i . (8.15)

The elements of the vector

y = T T (x − µ) /σ

are normally distributed. Moreover, on account of

E y = 0 and EyyT

= T TT = I ,

they are standardized and independent. Thus, according to (8.15), Qweighted

is χ2-distributed with m − r degrees of freedom, i.e.

EQweighted

= m − r or E Qunweighted = σ2 (m − r) , (8.16)

see Sect. 3.3. Unfortunately, we are not allowed to ignore the term (f − Pf )in (8.12). As a consequence we find

E Qunweighted = σ2 (m − r) +(fTf − fTP f

)(8.17)

and this, obviously, means that as long as the second term on the right-hand side does not vanish, the experimenter is not in a position to derivean estimator for the variance of the input data from the minimized sum ofsquared residuals. In other words, provided that measurement uncertaintiesare considered indispensable, in general there is no chance of carrying outleast squares adjustments based on single observations. In Sect. 10.1, however,we shall meet an exception.

8.3 Gauss–Markoff Theorem 105

Meanwhile, instead of using single observations x1, x2, . . . , xm, the exper-imenter may refer to arithmetic means

x1, x2, . . . , xm ; xi =1n

n∑l=1

xil (8.18)

the empirical variances of which

s2i =

1n − 1

n∑l=1

(xil − xi)2 ; i = 1, . . . , m (8.19)

are known from the outset. Hence, we shall treat inconsistent systems of thekind

Aβ ≈ x , (8.20)

in which the components of the input vector x are arithmetic means

xi = x0,i + (xi − µi) + fi; i = 1, . . . , m . (8.21)

In particular, we shall assume each mean to cover the same number n ofrepeat measurements. Just this ideas will enable us to assign a completeempirical variance–covariance matrix to the solution vector

β =(ATA

)−1ATx = BTx ; B = A

(ATA

)−1. (8.22)

8.3 Gauss–Markoff Theorem

As is well known, the concept of so-called “optimal estimators” frequentlyreferred to in measurement technique is due to the Gauss–Markoff theorem.1

Just for completeness let us present the theorem, temporarily ignoring the ex-istence of systematic errors so that the least squares estimators are unbiased.The Gauss–Markoff theorem derives the uncertainties of the components ofthe least squares solution vector

β = (β1 β2 · · · βr)T

from the diagonal elements of the associated theoretical variance–covariancematrix

1R.L. Plackett (Biometrika, (1949), 149–157) has pointed out that the theoremis due to Gauss (Theoria combinationis observationum erroribus minimis obnox-iae, 1821/23). Presumably, the notation Gauss–Markoff theorem intends to honorMarkoff’s later contributions to the theory of least squares.


E(

β − µβ

) (β − µβ

)T =

⎛⎜⎜⎜⎜⎜⎝

σβ1β1σβ1β2

. . . σβ1βr

σβ2β1σβ2β2

. . . σβ2βr

. . . . . . . . . . . .

σβr β1σβrβ2

. . . σβr βr

⎞⎟⎟⎟⎟⎟⎠ . (8.23)

If need be, we shall set σβkβk≡ σ2

βk. The uncertainties uβk

of the componentsβk; k = 1, . . . , r would be

uβk= √

σβkβk≡ σβk

; k = 1, . . . , r .

The Gauss–Markoff theorem states:Among all unbiased estimators, being linear in the input data, the least

squares estimator minimizes the diagonal elements of the theoretical variance–covariance matrix of the solution vector.

In view of the underlying error model, however, least squares estimatorsare biased. In this respect, the fate of the theorem should be sealed. On theother hand, as the ISO Guide steadfastly continues to take the validity ofthe theorem for granted, we shall carry out computer simulations in orderto settle the issue. As we shall see, the results not only differ in the sizesof the measurement uncertainties – an observation apparently regarded asdisputable on the part of experimenters – but still in another property which,in fact, manoeuvres the Guide in a difficult position, see Sects. 8.4 and 9.3.

Ignoring systematic errors and assuming observations of equal accuracy,we have

E(x − x0) (x − x0)

T

=σ2

nI , (8.24)

with I denoting an (m × m) identity matrix. According to (8.8) we haveµβ = β0. But then (8.23) yields

E(

β − β0

) (β − β0

)T =σ2

n

(ATA

)−1. (8.25)

Instead of B = A(ATA)−1 let us impute the existence of an (r ×m) matrixV defining another solution vector

γ = V x (8.26)

of Aβ ≈ x, also being linear in the observations. Of course, we ask for

γ0 = V x0 = V Aβ0 = β0 ; i.e. V A = I . (8.27)

As

E

(γ − γ0) (γ − γ0)T

=σ2

nV V T ,

8.4 Choice of Weights 107

we have to compare the diagonal elements of the variance–covariance matrices(ATA)−1 and V V T. The identity

V V T =(ATA

)−1+(V − BT

) (V − BT

)T(8.28)

reveals the diagonal elements of V V T to be non-negative; in particular theyturn out to be minimal in case V = BT, i.e. if V is due to least squares.

However, we are not allowed to ignore the third term BTf in (8.6). Con-sequently, the estimator β is no longer unbiased and just this fact violates theGauss–Markoff theorem. With respect to the error model under discussionwe state:

Empirical data, being biased by unknown systematic errors, cause theGauss–Markoff theorem to break down.

Nevertheless, let us continue to consider the classical form of an adjust-ment for a weighted linear system

GAβ ≈ Gx . (8.29)

According to the Gauss–Markoff theorem the optimal matrix of weightswould be

G =[E

(x − x0) (x − x0)T]−1/2

(8.30)

leading to minimal diagonal elements of the theoretical variance–covariancematrix

E(

β − β0

) (β − β0

)T =[(GA)T (GA)

]−1

(8.31)

of the solution vector

β = BTGx , B = (GA)[(GA)T (GA)

]−1

(8.32)

of the weighted system (8.29). However, as Sect. 9.3 will disclose, these con-siderations turn out to be inoperative so that we shall have to proceed dif-ferently.

Incidentally, for the purposes of the experimenter, the Gauss–Markofftheorem does not seem to be exhaustive insofar as it presumes an invari-able design matrix. On the other hand, it might well happen that one andthe same measuring problem allows for quite different design matrices lead-ing to varying measurement uncertainties. Unfortunately, the Gauss–Markofftheorem will in no way assist the experimenter in finding a design matrixgenerating absolutely minimal diagonal elements [20].

8.4 Choice of Weights

In general, the input data of the linear system


Aβ ≈ x (8.33)

will not be of the same accuracy. To boost the influence of more accuratedata and reduce the influence of less accurate data, so-called weights shouldbe introduced. Weighting the linear system means to multiply its m rows

αi1β1 + αi2β2 + · · · + αirβr ≈ xi ; i = 1, . . . , m (8.34)

by some factors gi = 0.2 We purposefully assemble the latter within a (non-singular) diagonal weight matrix

G = diag g1, g2, . . . , gm . (8.35)

We emphasize that weights, in principle, do not shift the components of thetrue solution vector β0. Clearly,

GAβ0 = Gx0 (8.36)

re-establishes

β0 =[(GA)T (GA)

]−1

(GA)T Gx0 (8.37)

regardless of the weights the experimenter has chosen. Mathematically, thisproperty is self-evident. Nevertheless, it will turn out to be of crucial impor-tance when measurement uncertainties are to be optimized, i.e. minimized.

Weighting procedures, however, shift the least squares estimators of theinconsistent linear system. Then, finding “points of attack”, weights have twotypes of consequences: they

– shift the components of the vector β and they– either expand or compress the respective uncertainties.

Assuming the unknown systematic errors to abrogate the Gauss–Markoff the-orem, there no longer is a rule of how to select the weight factors in order tominimize measurement uncertainties. Nevertheless, as will be demonstrated,the alternative error model is in a position to state:

The intervals

βk ± uβk; k = 1, . . . , r (8.38)

localize the true values β0,k of the components βk of the solution vector β forany choice of the weights.

In a sense, this statement restores the memory of the method of leastsquares, see also Sect. 9.3. According to this result, we are, in principle, freeto choose the weights gi. In the standard case we would derive the weightsfrom the uncertainties of the input data xi through

2Even linear combinations of rows are admissible.

8.5 Consistency of the Input Data 109

gi =1

uxi

; uxi =tP (n − 1)√

nsi + fs,i ; i = 1, . . . , m . (8.39)

Beyond, we may cyclically repeat the numerical calculations varying theweights by trial and error and observe the estimator’s uncertainties. Hence,we can see whether re-evaluation has produced larger or smaller uncertainties.

If the individual weights were multiplied by one and the same arbitraryfactor, nothing would change. For instance, the gi might be replaced with

√wi =

gi√m∑

i=1

g2i

. (8.40)

Clearly, the so-defined weights are dimensionless and their squares add up toone. But it remains immaterial to which set of weights we refer.

8.5 Consistency of the Input Data

Conventional least squares provides a test that shall reveal whether or not theinput data are consistent,3 i.e. whether they might be blurred by undetectedsystematic errors.4 In the presence of unknown systematic errors, any suchtest will, of course, be superfluous. Nevertheless, we shall derive it, though,basically, it simply is a repetition of the considerations of Sect. 8.2. Let theinput data be arithmetic means

x = (x1 x2 · · · xm)T . (8.41)

Their underlying theoretical variances

σ2i = E

(Xi − µi)

2

; i = 1, . . . , m (8.42)

may be different. Conventional least squares obtains the weight matrixG from the Gauss–Markoff theorem, i.e. from the theoretical variance–covariance matrix of the input data,

E

(x − µ) (x − µ)T

=1n

⎛⎜⎜⎜⎜⎜⎝

σ11 σ12 . . . σ1m

σ21 σ22 . . . σ2m

. . . . . . . . . . . .

σm1 σm2 . . . σmm

⎞⎟⎟⎟⎟⎟⎠ ; σii ≡ σ2

i , (8.43)

3From our point of view, consistency means that the relationship between thetrue values of the input data and the true solution vector is physically unique.

4Undetected simply means that it has not been known to the experimenter thatan unknown systematic error he should have taken into account has been operative.


setting

G =[E(x − µ) (x − µ)T

]−1/2

. (8.44)

Multiplying the inconsistent linear system

Aβ ≈ x (8.45)

on the left by G yields

GAβ ≈ Gx . (8.46)

The projection operator and the vector of the residuals are given by

P = (GA)[(GA)T (GA)

]−1

(GA)T (8.47)

and

r = Gx − PGx (8.48)

respectively. Inserting

x = x0 + (x − µ) + f (8.49)

we find

r = G (x − µ) − PG (x − µ) + (Gf − P Gf ) , (8.50)

since5

PGx0 = Gx0 . (8.51)

Disregarding systematic errors in (8.50), i.e. the third term on the right, theminimized sum of squared residuals is given by

Qweighted = rTr = [G (x − µ)]T (I − P ) [G (x − µ)] , (8.52)

where I designates an (m×m) identity matrix. Let T denote an orthogonal(m × m) matrix and y an (m × 1) auxiliary vector. Putting

G (x − µ) = T y; y = T TG (x − µ) (8.53)

and referring to (8.14) we find

Qweighted = (T y)T (I − P )T y =m−r∑i=1

y2i . (8.54)

5Following (8.36) the vector Gx0 is an element of the column space of the matrixGA. According to (8.47), the operator P projects onto this space.

8.5 Consistency of the Input Data 111

Again, the components of the auxiliary vector y are normally distributed(even if there are dependencies between the means xi, see Appendix C,[40]). As

E y = 0 and EyyT

= T TT = I ,

Qweighted is χ2-distributed with m − r degrees of freedom, i.e.

EQweighted

= m − r .

After all, given that the input data are free from systematic errors we shouldobserve

Qweighted ≈ m − r . (8.55)

Meanwhile, accounting for the term (Gf − PGf ) disregarded in (8.50), andthe observation that, due to systematic errors, the weight matrix G as pro-posed in (8.35) should be used, we have instead

EQweighted

=

m∑i=1

g2i

σ2i

n−

m∑i=1

g2i

σ2i

npii +

[(Gf)T (Gf) − (Gf)TP (Gf)

].

(8.56)

Here, the pii designate the diagonal elements of the projection matrix Pintroduced in (8.47). We note that (8.56) presupposes independent xi as oth-erwise the expected value EQweighted would turn out to be more complex.We observe:

Unknown systematic errors suspend the property of the weighted min-imized sum of squared residuals to estimate the number of the degrees offreedom of the linear system.

For decades empirical observations, in particular those relating to high-level metrology, have revealed over and over again

Qweighted m − r , (8.57)

[16,17]. Procedures relying on the ISO Guide [36] then speak of inconsistentinput data attributing the inconsistencies to undetected unknown systematicerrors. However, within the alternative error model, (8.57) does not necessar-ily express inconsistencies among the input data, as we think (8.56) ratherthan (8.55) applies. Whether or not measuring data are inconsistent shouldbe scrutinized by other means.

9 Uncertainties of Least Squares Estimators

9.1 Empirical Variance–Covariance Matrix

To determine the uncertainties of the least squares estimators, we resort tothe decomposition (8.6),

β = BTx = BT [x0 + (x − µ) + f ] . (9.1)

Let us firstly estimate the term BT (x − µ). In contrast to the conventionalerror calculus, we shall rely on empirical quantities, i.e. we shall establishthe empirical variance–covariance matrix direct from the input data. Thisis possible, given each of the xi; i = 1, . . . , m relates to the same numberof repeat measurements, as was presumed. Then, the said matrix allows usto express those parts of the overall uncertainties which are due to randomerrors through confidence intervals according to Student. In principle, weshall proceed as in Sect. 6.3 starting from the components βk; k = 1, . . . , r ofthe estimator (9.1). Denoting the elements of the matrix B by bik,

B = A(ATA)−1 = (bik) , i = 1, . . . , m ; k = 1, . . . , r ,

we find from

βk =m∑

i=1

bikxi ; k = 1, . . . , r , (9.2)

inserting the means

xi =1n

n∑l=1

xil ,

the representations

βk =m∑

i=1

bik

[1n

n∑l=1

xil

]=

1n

n∑l=1

[m∑

i=1

bikxil

]

=1n

n∑l=1

βkl ; k = 1, . . . , r . (9.3)

114 9 Uncertainties of Least Squares Estimators

For convenience, the sums

βkl =m∑

i=1

bikxil ; k = 1, . . . , r (9.4)

may be understood as components of a least squares estimator, the inputdata of which are the m individual measurements x1l, x2l, . . . , xml: each ofthe m means xi contributes just the l-th measured datum. The followingillustration depicts the situation:

x11 x12 . . . x1l . . . x1n ⇒ x1

x21 x22 . . . x2l . . . x2n ⇒ x2

. . . . . . . . . . . . . . . . . . . . . . . .

xm1 xm2 . . . xml . . . xmn ⇒ xm

⇓ ⇓ ⇓ ⇓ ⇓βk1 βk2 βkl βkn ⇒ βk

According to (9.3), the βk are the arithmetic means of the n means βkl. Thedifferences

βkl − βk =m∑

i=1

bik (xil − xi) ; l = 1, . . . , n; k = 1, . . . , r (9.5)

allow us to establish the empirical variances and covariances

sβkβk′ =1

n − 1

n∑l=1

(βkl − βk

) (βk′l − βk′

); k, k′ = 1, . . . , r

of the components of the estimator β and thus its empirical variance–covariance matrix

sβ =

⎛⎜⎜⎜⎜⎝

sβ1β1sβ1β2

. . . sβ1βr

sβ2β1sβ2β2

. . . sβ2βr

. . . . . . . . . . . .

sβrβ1sβrβ2

. . . sβrβr

⎞⎟⎟⎟⎟⎠ ; sβkβk

≡ s2βk

. (9.6)

Inserting the differences (9.5) into sβkβk′ , we obtain

sβkβk′ =1

n − 1

n∑l=1

[m∑

i=1

bik (xil − xi)

]⎡⎣ m∑j=1

bjk′ (xjl − xj)

⎤⎦

=m∑i,j

bikbjk′sij . (9.7)

9.1 Empirical Variance–Covariance Matrix 115

Here the quantities

sij =1

n − 1

n∑l=1

(xil − xi) (xjl − xj) ; i, j = 1, . . . , m (9.8)

designate the elements of the empirical variance–covariance matrix of theinput data

s =

⎛⎜⎜⎜⎜⎝

s11 s12 . . . s1m

s21 s22 . . . s2m

. . . . . . . . . . . .

sm1 sm2 . . . smm

⎞⎟⎟⎟⎟⎠ . (9.9)

By means of the column vectors bk; k = 1, . . . , r of the matrix B,

B = (b1 b2 · · · br) , (9.10)

we can compress the variances and covariances as defined in (9.7) into

sβkβk′ = bTk s b k′ ; k, k′ = 1, . . . , r . (9.11)

Hence, (9.6) turns into

sβ =

⎛⎜⎜⎜⎜⎝

bT1s b1 bT

1s b2 . . . bT1s br

bT2s b1 bT

2s b2 . . . bT2s br

. . . . . . . . . . . .

bTrs b1 bT

rs b2 . . . bTrs br

⎞⎟⎟⎟⎟⎠ = BTs B . (9.12)

As each of the βk; k = 1, 2, . . . , r relies on the same set of input data, theynecessarily are dependent. The same applies to the βkl. Moreover, for anyfixed i, the sequence of the xil is indeterminate, i.e. we may interchange thexil; i fixed. With these permutations, the βk do not alter; the βkl, however,will alter. Consequently, the elements of the matrices (9.9) and (9.12) dependon the sequences of the xil; i fixed. This, nevertheless, complies perfectly withthe properties of the underlying statistical distributions and, in particular,with our aim of introducing Student’s confidence intervals.

In order to find the empirical variance–covariance matrix of the weightedadjustment, we resort to (8.29)

GAβ ≈ Gx = G [x0 + (x − µ) + f ]

and (8.32)1

1Though the weighted solution vector β will be different from the unweightedone, for simplicity, we use the same symbol.


β = BTGx ; B = (GA)[(GA)T (GA)

]−1

.

Denoting the elements bik of the (m × r) matrix B by bik, we write the rcomponents βk of the vector β in the form

βk =m∑

i=1

bikgixi ; k = 1, . . . , r . (9.13)

Insertion of

xi =1n

n∑l=1

xil

leads to

βk =m∑

i=1

bikgi

[1n

n∑l=1

xil

]=

1n

n∑l=1

[m∑

i=1

bikgixil

]=

1n

n∑l=1

βkl . (9.14)

Again, we may consider the quantities

βkl =m∑

i=1

bikgixil (9.15)

as the components of a least squares estimator, the input data of which arethe m weighted individual measurements x1l, x2l, . . . , xml. Referring to thedifferences

βkl − βk =m∑

i=1

bikgi (xil − xi) ; k = 1, . . . , r ,

we form the elements

sβkβk′ =1

n − 1

n∑l=1

[m∑

i=1

bikgi (xil − xi)

]⎡⎣ m∑j=1

bjk′gj (xjl − xj)

⎤⎦

=m∑i,j

bikbjk′gigjsij (9.16)

of the empirical variance–covariance matrix sβ . The matrix itself is given by

sβ =

⎛⎜⎜⎜⎜⎜⎝

bT1

(GTs G

)b1 bT

1

(GTs G

)b2 . . . bT

1

(GTs G

)br

bT2

(GTs G

)b1 bT

2

(GTs G

)b2 . . . bT

2

(GTs G

)br

. . . . . . . . . . . .

bTr

(GTs G

)b1 bT

r

(GTs G

)b2 . . . bT

r

(GTs G

)br

⎞⎟⎟⎟⎟⎟⎠

= BT(GTs G

)B . (9.17)

9.2 Propagation of Systematic Errors 117

Here, we have made use of

B =(b1 b2 · · · br

)=(bik

). (9.18)

The elements sij and the matrix s have been defined in (9.8) and (9.9) re-spectively.

Each of the quantities βkl, as defined in (9.4) and (9.15), imply just oneset of input data x1l, x2l, . . . , xml. Assuming the successive sets l = 1, 2, . . . , nto be independent and assigning an m-dimensional normal density to eachof them, we are in a position to introduce Student’s confidence intervals. Forthe non-weighted adjustment we have

βk ± tP (n − 1)√n

√√√√ m∑i,j

bikbjk sij ; k = 1, . . . , r (9.19)

while the weighted adjustment yields

βk ± tP (n − 1)√n

√√√√ m∑i,j

bik bjkgigj sij ; k = 1, . . . , r . (9.20)

The intervals (9.19) and (9.20) localize the components of the vectors

µβ = β0 + BTf (9.21)

and

µβ = β0 + BTGf (9.22)

respectively, with probability P as given by tP (n − 1).

9.2 Propagation of Systematic Errors

From (9.21) and (9.22) we read the systematic errors of the estimators of thenon-weighted and the weighted adjustment respectively,

f β = BTf ; f β = BTGf , (9.23)

or, written in components,

fβk=

m∑i=1

bikfi ; fβk=

m∑i=1

bikgifi ; k = 1, . . . , r . (9.24)

Their worst-case estimations are


fs,βk=

m∑i=1

|bik| fs,i (9.25)

and

fs,βk=

m∑i=1

∣∣∣bik

∣∣∣ gifs,i . (9.26)

Should fi = f = const., i = 1, . . . , m apply, the estimations turn out to bemore favorable,

fs,βk= fs

∣∣∣∣∣m∑

i=1

bik

∣∣∣∣∣ ; fs,βk= fs

∣∣∣∣∣m∑

i=1

bikgi

∣∣∣∣∣ . (9.27)

According to (9.25) and (9.26), the new formalism should respond to anrising number m of input data with an inappropriate growth in the propa-gated systematic errors, thus blotting out the increase in input information.This would, of course, scarcely make sense. Fortunately, as the examples inSect. 9.3 will underscore, this indeed does not apply. For now, let us make thisobservation a bit plausible. Suppressing the index k and confining ourselvesto m = 2 so that

fβ = b1f1 + b2f2 ,

we find from

fβ = b1x1f1

x1+ b2x2

f2

x2,

dividing by β = b1x1 + b2x2,

fβ

β= p1

f1

x1+ p2

f2

x2,

where

p1 =b1x1

β; p2 =

b2x2

β; p1 + p2 = 1 .

As long as p1 and p2 act as true weight factors, being both positive, we have

fs,β∣∣β∣∣ = p1fs,1

|x1| + p2fs,2

|x2| .

In general, however, the situation might not be that simple.

9.3 Overall Uncertainties of the Estimators 119

9.3 Overall Uncertainties of the Estimators

To find the overall uncertainties uβk; k = 1, . . . , r of the least squares adjust-

ment, we refer to the decomposition

β = BTG [x0 + (x − µ) + f ]

= β0 + BTG (x − µ) + BTGf . (9.28)

As usual, we linearly combine the estimations (9.19), (9.25) and (9.20), (9.26)respectively, to obtain the quested uncertainty intervals

βk − uβk≤ β0, k ≤ βk + uβk

; k = 1, . . . , r . (9.29)

While the non-weighted adjustment yields

uβk=

tP (n − 1)√n

√√√√ m∑i,j

bikbjksij +m∑

i=1

|bik| fs,i , (9.30)

the weighted adjustment issues

uβk=

tP (n − 1)√n

√√√√ m∑i,j

bik bjkgigjsij +m∑

i=1

∣∣∣bik

∣∣∣ gifs,i . (9.31)

In either case, we expect the intervals (9.29) to localize the true values β0,k

of the estimators βk; k = 1, . . . , r. As Fig. 9.1 depicts, the least squares ad-justment depends on

– the input data,– the choice of the design matrix A, and– the structure of the weight matrix

Let us recall (see Sect. 8.4) that weightings involve two things: they nu-merically shift the estimators and they alter the pertaining uncertainties.Nonetheless, as the uncertainty intervals steadfastly localize the estimators’true values independent of which weights have been chosen, we may attributesomething like a memory to the approach. Also, we should not cling so muchto the numerical values of the estimators but rather consider intervals asquoted in (9.29). As Fig. 9.2 illustrates, we may assume a set of uncertaintyintervals stemming from different experiments and aiming at one and thesame physical quantity. Then, each of the intervals should localize the com-mon true value β0,k. In other words, the intervals should mutually overlap.

The property (9.29) proves to be of vital importance and differs signifi-cantly from the implications of the ISO Guide. Though the latter evens outinconsistencies among the input data, the experimenter must, unfortunately,


Fig. 9.1. The least squares adjustment depends on the input data, the designmatrix A and the weight matrix G. The latter may be chosen by trial and error tominimize the uncertainties of the estimators

Fig. 9.2. Overlaps of uncertainty regions of measuring results coming from differentpaths and aiming at one and the same physical quantity β0,k

be aware of the fact that the uncertainties at hand might not embrace thetrue values he is looking for.

Nonetheless, for completeness, we add the Guide’s formalism: The linearsystem and the least squares estimator of the non-weighted adjustment aregiven by

Aβ ≈ x , β = BTx , B = A(ATA

)−1(9.32)

where

β0 = Eβ

= BTx0


is tacitly assumed. But then, the theoretical variance–covariance matrix ofthe estimator β would be

E(

β − β0

) (β − β0

)T = BTE(x − x0) (x − x0)

T

B . (9.33)

The square roots of its diagonal elements are seen to be the uncertainties ofthe non-weighted adjustment. Similarly, for the weighted adjustment we have

GA β ≈ G x , β = BTG x , B = (GA)[(GA)T (GA)

]−1

(9.34)

where again

β0 = Eβ

= BTG x0

is taken for granted. Referring to a weight matrix of the form

G =[E(x − x0) (x − x0)

T]−1/2

, (9.35)

the theoretical variance–covariance matrix of the weighted ISO adjustmentis given by

E(

β − β0

) (β − β0

)T = BTGE

(x − x0) (x − x0)T

GB

= BTB =[(GA)T (GA)

]−1

. (9.36)

Again, the square roots of the diagonal elements of this matrix act as theuncertainties of the weighted adjustment.

Let us recall, on account of the randomization postulate, there is no dis-tinction between the expectation E

β

of the estimator β and its true valueβ0 . According to (9.21) and (9.22), however, these quantities differ by thebiases

BTf and BTGf

respectively. After all, the experimenter has no means to exclude the possi-bility that the weights have shifted his estimators away from their true valuesand, paradoxically, shrunk their uncertainties.

Examples

(i) Non-weighted and weighted adjustment: A computer simulation shall helpus to compare the error model of the ISO Guide with that of the alternativeapproach. To this end we consider the linear system

10 β1 + β2 + 9 β3 ≈ x1

− β1 + 2 β3 ≈ x2

β1 + 2 β2 + 3 β3 ≈ x3

−3 β1 − β2 + 2 β3 ≈ x4

β1 + 2 β2 + β3 ≈ x5


and let

β0,1 = 1 , β0,2 = 2 , β0,3 = 3

be the true values of the parameters so that the true values of the input dataturn out to be

x0,1 = 39 , x0,2 = 5 , x0,3 = 14 , x0,4 = 1 , x0,5 = 8 .

We simulate normally distributed input data xil; i = 1, . . . , 5; l = 1, . . . , 10.It seems recommendable to process random and systematic errors of thesame order of magnitude. That’s why we set, for any fixed i, the theoreticalvariance of the random errors equal to the postulated theoretical variance ofthe pertaining fi,

σ2xi

=13f2

s,i ; i = 1 , . . . , m .

We assume the boundaries of the systematic errors and their actual valuesto be

fs,1 = 0.39 × 10−2 , fs,2 = 0.35 × 10−4 , fs,3 = 0.98 × 10−5

fs,4 = 0.35 × 10−4 , fs,5 = 0.8 × 10−3

and

f1 = 0.351 × 10−2 , f2 = 0.315 × 10−4 , f3 = −0.882 × 10−5

f4 = −0.315× 10−4 , f5 = −0.72 × 10−3

respectively, i.e. we multiplied the boundaries fs,i; i = 1, . . . , 5 by a factor of0.9 and purposefully chose the signs of the actual systematic errors in sucha way that the Guide’s 2σ uncertainties of the weighted adjustment fail tolocalize the true values β0,k of the estimators βk.

We refer to a weight matrix G of the form (8.35), the entries being thereciprocals of the uncertainties of the input data corresponding to the ISOGuide and to the alternative approach respectively, i.e.

gi = 1/uxi , i = 1, . . . , 5

with

uxi =

√s2

xi

n+

13f2

s,i ISO Guide and

uxi =tP (n − 1)√

nsxi + fs,i alternative model .

The alternative error model allows us to manipulate the weights of the ad-justment. To exemplify this possibility we choose


g1 = 1/ux1 g2 = 4/ux2 g3 = 10/ux3 g4 = 4/ux4 g5 = 1/ux5 .

For the ISO Guide we quote so-called 2σ uncertainties. The alternative errormodel relates the random parts of the overall uncertainties to a confidencelevel of P = 95%.

Tables 9.1–9.3 display the results. In general, the 2σ uncertainties of theGuide may be considered acceptable. There are, however, singular cases inwhich the Guide fails to localize the true values of the estimators. We pur-posefully exemplified this situation thus underscoring that the Guide’s 2σuncertainties can be too short.

In contrast to this, the alternative approach presents us with robust un-certainties safely localizing the true values of the estimators. Even if we ma-nipulate the weights, the localization property of the alternative error modelstill persists.

(ii) Weighted mean of means: Given m arithmetic means and their re-spective uncertainties,

xi =1n

n∑l=1

xil ; uxi =tP (n − 1)√

nsi + fs,i ; i = 1 , . . . , m . (9.37)

We shall find the weighted grand mean β and its uncertainty uβ.To average a pool of given arithmetic means ranks among the most fre-

quently used standard procedures of metrology. Clearly, this is done in orderto extract as much information as possible out of the data. The proceedingis taken permissible, trouble-free and self-evident. But consider the followingproposal:

Averaging a set of means makes sense only if the underlying true valuesx0,i; i = 1, . . . , m are identical.

As has been discussed in Sect. 7.1, to average means to perform a leastsquares adjustment. Here we have, instead of (7.2),

Table 9.1. Least squares estimators and their uncertainties based on the ISOGuide and the alternative error model, respectively. In the first case so-called 2σuncertainties have been quoted, in the latter case the random parts of the overalluncertainties refer to a confidence level of P=95%. The * notifies manipulatedweights

Model β1 ± uβ1β2 ± uβ2

β3 ± uβ3

ISO non-weighted 1.0003 ± 0.0004 1.9995 ± 0.0005 3.00016 ± 0.00021

ISO weighted 1.00013 ± 0.00011 1.99980 ± 0.00016 3.00009 ± 0.00007

altern. non-weighted 1.0003 ± 0.0005 1.9995 ± 0.0008 3.00017 ± 0.00029

altern. weighted 1.00014 ± 0.00020 1.99978 ± 0.00030 3.00009 ± 0.00013

altern. weighted* 1.00012 ± 0.00017 1.99981 ± 0.00025 3.00008 ± 0.00011


Table 9.2. Data taken from Table 9.1: Localization in least squares adjustmentbased on the ISO Guide. The weighted adjustment fails to localize the true values

ISO Guide

non-weighted weighted

0.9999 < 1 < 1.0007 1.00002 > 1 < 1.00024

1.9990 < 2 = 2.0000 1.99964 < 2 > 1.99996

2.99995 < 3 < 3.00037 3.00002 > 3 < 3.00016

Table 9.3. Data taken from Table 9.1: Localization in least squares adjust-mentvbased on the alternative error model. The * notifies manipulated weights

alternative model

non-weighted weighted weighted*

0.9998 < 1 < 1.0008 0.99994 < 1 < 1.00034 0.99995 < 1 < 1.00029

1.9987 < 2 < 2.0003 1.99948 < 2 < 2.00008 1.99956 < 2 < 2.00006

2.99988 < 3 < 3.00046 2.99996 < 3 < 3.00022 2.99997 < 3 < 3.00019

⎛⎜⎜⎝

11· · ·1

⎞⎟⎟⎠ β ≈

⎛⎜⎜⎝

x1

x2

· · ·xm

⎞⎟⎟⎠ , i.e. aβ ≈ x . (9.38)

Obviously, the underlying consistent system reads⎛⎜⎜⎝

11· · ·1

⎞⎟⎟⎠ β0 =

⎛⎜⎜⎝

x0,1

x0,2

· · ·x0,m

⎞⎟⎟⎠ . (9.39)

From this we take that the true value β0 of the grand mean β cannot complywith different true values of the input data. Consequently we have to requireidentical true values x0,i = x0; i = 1, . . . , m.

Nevertheless, we might wish to average “ad hoc”, i.e. without resortingto the method of least squares. Let us average two masses m1 = 1/4kgand m2 = 3/4 kg each being accurate to about ±1mg each. The mean ism = 1/2 kg. However, the associated uncertainty exceeds the uncertaintiesof the input data by the factor 250 000. The uncertainty has been blown upsince the true values of the masses differ. We are sure, uncertainties should bedue to measurement errors and not to deviant true values. This drasticallyexaggerating example elucidates that the averaging of means implies far-reaching consequences if done inappropriately.

To weight the inconsistent linear system

aβ ≈ x


we set

G = diag g1 , g2 , . . . , gm ; gi = 1/uxi

so that

Gaβ ≈ Gx. (9.40)

The least squares estimator is

β =m∑

i=1

wixi ; wi =g2

im∑

i=1

g2i

. (9.41)

Let us replace the means x1, x2, . . . , xm with the respective l-th individualmeasurements x1l, x2l, . . . , xml

x11 x12 . . . x1l . . . x1n ⇒ x1

x21 x22 . . . x2l . . . x2n ⇒ x2

. . . . . . . . . . . . . . . . . . . . . . . .

xm1 xm2 . . . xml . . . xml ⇒ xm .

(9.42)

Then, (9.41) turns into

βl =m∑

i=1

wixil ; l = 1 , . . . , n (9.43)

so that

β =1n

n∑l=1

βl . (9.44)

Inserting

xil = x0,i + (xil − µi) + fi ; xi = x0,i + (xi − µi) + fi ,

(9.41) and (9.43) yield

βl =m∑

i=1

wi [x0,i + (xil − µi) + fi] and

β =m∑

i=1

wi [x0,i + (xi − µi) + fi] . (9.45)

Being free from systematic errors, the difference


βl − β =m∑

i=1

wi (xil − xi) (9.46)

solely expresses the influence of random errors. Consequently, the empiricalvariance of the estimator β is given by

s2β =

1n − 1

n∑l=1

(βl − β

)2

=1

n − 1

n∑l=1

[m∑

i=1

wi (xil − xi)

]⎡⎣ m∑j=1

wj (xjl − xj)

⎤⎦

=m∑i,j

wiwjsij (9.47)

where the

sij =1

n − 1

n∑l=1

(xil − xi) (xjl − xj) ; i, j = 1, . . . , m ; sii ≡ s2i (9.48)

denote the empirical variances and covariances of the input data (9.42).Again, the sij shall define the empirical variance–covariance matrix s andthe weights wi an auxiliary vector w,

s =

⎛⎜⎜⎜⎜⎝

s11 s12 . . . s1m

s21 s22 . . . s2m

. . . . . . . . . . . .

sm1 sm2 . . . smm

⎞⎟⎟⎟⎟⎠ ; w = (w1 w2 · · · wm)T .

Hence, (9.47) takes the form

s2β = wTs w . (9.49)

The propagated systematic errors follow from (9.45),

fβ =m∑

i=1

wifi ,

and its worst case estimation is

fs,β =m∑

i=1

wifs,i . (9.50)

Let us formally assign an m-dimensional normal density to each of the n datasets


x1l, x2l, . . . , xml ; l = 1, . . . , n (9.51)

depicted as columns in the decomposition (9.42). But then the βl; l = 1, . . . , ngiven in (9.43) are normally distributed – whether there are dependencies be-tween the individuals x1l, x2l, . . . , xml, l fixed, remains immaterial. Moreover,as we consider the n successive sets (9.51) to be independent, the βl are alsoindependent.

Finally, the overall uncertainty of the weighted mean β is given by

uβ =tP√n

√√√√ m∑i,j

wiwjsij +m∑

i=1

wifs,i . (9.52)

It looks as if (9.50) would overestimate the influence of systematic errors, i.e.as if fs,β might grow inadequately, the greater the number m of input data.This is not, however, the case, as the following two examples suggest.

Given ux1 = ux2 = ux3 , we find w1 = w2 = w3 = 1/3 so that

fs, β = (fs, 1 + fs, 2 + fs, 3) /3 .

Given ux1 = a, ux2 = 2a, ux3 = 3a we obtain w1 = 36/49, w2 = 9/49,w1 = 4/49, i.e.

fs, β = (36fs, 1 + 9fs, 2 + 4fs, 3) /49 .

Due to the weight factors wi, the propagated systematic error preserves quitea reasonable order of magnitude even if the number of means entering theaverage should be large.

Let us summarize: We conclude that the Guide’s recommendation to as-sign distribution densities to unknown systematic errors should scarcely betenable.

Over the years, appreciable efforts have been undertaken to develop aformalism allowing experimenters to randomize detected unknown systematicerrors and to somehow repair the bothersome observation (8.57).

Ultimately, the inconsistencies in meeting the influence of unknown sys-tematic errors had condensed themselves into a sheerly inextricable clew ofdifferent procedures competing with one another, similar to a Gordian knot.In connection with the adjustment of the fundamental constants of physics,even the abolition of the method of least squares was discussed [13].

Arguably, the idea of treating unknown systematic errors on a proba-bilistic basis was inspired by the objective of designing lowest possible mea-surement uncertainties. The analysis of variance disclosed, however, a firstinconsistency, Sect. 4.3. The computer simulations testing the reliability ofthe Guide’s measurement uncertainties showed that there are situations inwhich even so-called 2σ uncertainties turn out to be too small, Sect. 6.4. Thetask to calculate the grand mean of a group of means accentuated anew


the need to base data evaluations on true values, Sect. 9.3. Finally, as (8.57)underscored, (8.56) seems to reflect the actual metrological situation morerealistically than (8.55) could.

On the other hand, we are in a position to undo the Gordian knot as soonas we are willing to abstain from randomizing systematic errors. Then, weunderstand why the standard tools of data evaluation do no longer producereasonable results, applied to data of empirical origin. It suggests that weeither revise those classical concepts according to the prevailing metrologicalsituation, or, if necessary, abolish them. Only then shall we be in a position tointerlace our measurement results within a reliable and traceable metrologicalnet of data.

Though a rigorous revision of the error calculus might appear uninviting,above all the aim is to extract a maximum of reliable, objectively justifiableinformation out of the measurements which, as is well known, require con-siderable efforts even in standard cases and virtually huge efforts in specialcases.2

9.4 Uncertainty of a Function of Estimators

According to Sect. 6.5, we shall quote the uncertainty uφ of a given functionφ of the components β1, β2, . . . , βr of the least squares estimator β, i.e. weshall look for

β1 ± uβ1, β2 ± uβ2

, . . . , βr ± uβr⇒ φ

(β1, β2, . . . , βr

)± uφ . (9.53)

For simplicity, we confine ourselves to a function of two estimators β1, β2 and,as usual, neglect the errors of the linearization of φ. Inserting

xi = x0,i + (xi − µi) + fi ; i = 1, . . . , m

into

β1 =m∑

i=1

bi1xi and β2 =m∑

i=1

bi2xi

yields

βk = β0,k +m∑

i=1

bik (xi − µi) +m∑

i=1

bikfi ; k = 1, 2

so that2The vacuum tube of the Large Hadron Collider (LHC) of the European Orga-

nization for Nuclear Research (CERN), Geneva has a circumference of 27 km, andits magnets work near absolute zero. Among the results physicists are waiting foris the proof of the Higgs boson.

9.4 Uncertainty of a Function of Estimators 129

φ(β1, β2

)= φ (β0,1, β0,2) +

∂φ

∂β1

[m∑

i=1

bi1 (xi − µi)

]+

∂φ

∂β2

[m∑

i=1

bi2 (xi − µi)

]

+∂φ

∂β1

[m∑

i=1

bi1fi

]+

∂φ

∂β2

[m∑

i=1

bi2fi

]

or

φ(β1, β2

)= φ (β0,1, β0,2) +

m∑i=1

(∂φ

∂β1bi1 +

∂φ

∂β2bi2

)(xi − µi)

+m∑

i=1

(∂φ

∂β1bi1 +

∂φ

∂β2bi2

)fi . (9.54)

To abbreviate, we put

χi =∂φ

∂β1bi1 +

∂φ

∂β2bi2 ; i = 1, . . . , m . (9.55)


φ(β1, β2

)= φ (β0,1, β0,2) +

m∑i=1

χi (xi − µi) +m∑

i=1

χifi . (9.56)

For convenience, we also assign the χi; i = 1, . . . , m to an (m × 1) columnvector

χ = (χ1 χ2 . . . χm)T . (9.57)

First, we assess the second term on the right hand side of (9.56), which,apparently, is due to random errors. With reference to the notations (9.2),(9.3) and (9.4),

βk =m∑

i=1

bikxi =1n

n∑l=1

βkl ; βkl =m∑

i=1

bikxil ; k = 1, 2 ,

we expand φ(β1l, β2l) throughout a neighborhood of β1, β2,

φ(β1l, β2l

)= φ(β1, β2

)+

∂φ

∂β1

(β1l − β1

)+

∂φ

∂β2

(β2l − β2

)= φ(β1, β2

)+

m∑i=1

(∂φ

∂β1bi1 +

∂φ

∂β2bi2

)(xil − xi)

= φ(β1, β2

)+

m∑i=1

χi (xil − xi) .

The difference φ(β1l, β2l) − φ(β1, β2) leads us to the empirical variance


s2φ =

1n − 1

n∑l=1

[φ(β1l, β2l

)− φ(β1, β2

)]2 = χTs χ . (9.58)

Here, s designates the empirical variance–covariance matrix of the input data(9.9) and the vector χ has been defined in (9.57).

The third term on the right of (9.56),

fφ =m∑

i=1

χifi , (9.59)

is due to the propagation of systematic errors. Its worst case estimation yields

fs,φ =m∑

i=1

|χi|fs,i . (9.60)

Finally, the overall uncertainty of the estimator φ(β1, β2) turns out to be

uφ =tP (n − 1)√

n

√χTs χ +

m∑i=1

|χi|fs,i . (9.61)

Before any particular worst case estimate can be carried out, we have tomerge in brackets all terms relating to the same fi, otherwise the triangleinequality would overestimate the share of the overall uncertainty cause bysystematic errors.

9.5 Uncertainty Spaces

The preceding section dwelled on the problem of how to estimate the un-certainty uφ of a given function φ, linking some or all of the least squaresestimators. Now we turn toward the problem of spanning so-called uncer-tainty spaces by means of the estimators themselves.

Let us illustrate this notion by conceiving a least squares adjustment to acircle. Should it suffice to estimate just the radius r, we can confine ourselvesto a result of the form r ± ur. However, if we additionally want to know allabout where we may insert the tip of the compasses and how wide to openits legs, we have to revert to the couplings between the radius r and thecoordinates xM , yM of its center hidden within the adjustment’s quotations

xM ± uxM , yM ± uyM ; r ± ur .

These intervals obviously imply all circles which are compatible with the inputdata and the error model. We expect the latter to provide a small circularregion localizing the true center x0, y0 of the circle and a large annular regionenclosing the true arc x0, y0; r0. We shall treat this problem in Sect. 11.4.

9.5 Uncertainty Spaces 131

The origin of the couplings between the components of the estimator βis obvious. As each of the r components

βk =m∑

i=1

bikxi

= β0,k +m∑

i=1

bik (xi − µi) +m∑

i=1

bikfi ; k = 1, . . . , r (9.62)

relies on one and the same set of input data, the βk are necessarily depen-dent. The second term on the right-hand side expresses the couplings due torandom errors. As the xi imply the same number of repeat measurements,we gain access to the complete empirical variance–covariance matrix of theinput data. Hence, we are in a position to formalize the said dependencies byHotelling’s density. Remarkably enough, the confidence ellipsoids providedby this density differ substantially from those referred to in the conventionalerror calculus. As is known, the conventional error calculus obtains its con-fidence ellipsoids from the exponent of the multidimensional normal proba-bility density so that they are based on unknown theoretical variances andcovariances, thus hampering metrological interpretations.

Finally, the third term on the right-hand side of (9.62) formalizes thecouplings due to systematic errors, i.e. to influences entirely ignored on thepart of the Gaussian error calculus.

9.5.1 Confidence Ellipsoids

Let us reconsider the decomposition (9.3),

βk =1n

n∑l=1

βkl ; k = 1, . . . , r

where, as (9.4) shows,

βkl =m∑

i=1

bikxil ; k = 1, . . . , r, l = 1, . . . , n .

Assuming normally distributed measured values xil, the βkl are also normallydistributed, furthermore, as has already been discussed, they are independent.The empirical variance–covariance matrix of the βk has been stated in (9.12),

sβ = BTs B .

To establish Hotelling’s ellipsoid, we refer to (3.73).3 Assembling the expec-tations

3I am grateful to Dr. Woger for calling my attention to this analogy.


µβk= E

βk

= β0,k +

m∑i=1

bikfi ; k = 1, . . . , r (9.63)

within a column vector µβ as we did in (8.8), we have

t2P (r, n) = n(β − µβ

)T (BTs B

)−1 (β − µβ

). (9.64)

The ellipsoid is centered in µβ; the probability of holding any realization ofβ is given by

P =

tP (r,n)∫0

pT (t; r, n)dt ; (9.65)

pT (t; r, n) has been stated in (3.75). Experimentally speaking, the vector µβ

cannot be known. On the other hand, we may refer (9.64) to β and substitutean auxiliary vector β for µβ. The so-defined confidence ellipsoid

t2P (r, n)n

=(β − β

)T (BTs B

)−1 (β − β

)(9.66)

localizes the vector µβ with probability P as given in (9.65). In principle, thisprocedure is similar to that we have used to define one-dimensional confidenceintervals in Sect. 5.1.

9.5.2 The New Solids

The propagated systematic errors define a geometric solid as well. Accordingto (9.62), it is constituted by the components

fβk=

m∑i=1

bikfi ; −fs,i ≤ fi ≤ fs,i ; k = 1, . . . , r (9.67)

of the vector

f β = BTf =(fβ1

fβ2· · · fβr

)T. (9.68)

In a sense, the solid comes into being by letting the vector

f = (f1 f2 · · · fm) (9.69)

“scan” the set of points defined by the m-dimensional hypercuboid

−fs,i ≤ fi ≤ fs,i ; i = 1, 2, . . . , m (9.70)

and lying within and on its boundary surfaces. It goes without saying thatall vectors f of the null space N(BT) of the matrix BT are mapped into thenull vector f β = 0 and will not be of interest to us.

To begin with, let us focus on some fundamental properties of the linearsingular mapping (9.67). The projected solid is


– enclosed within an r-dimensional hypercuboid of extensions

−fs,βk≤ fβk

≤ fs,βk; fs,βk

=m∑

i=1

|bik| fs,i ; k = 1, 2, . . . , r (9.71)

– convex, as the mapping is linear and the hypercuboid (9.70) is convex.4

For any τ ∈ [0, 1] we have

f β = f(1)

β+ τ(f

(2)

β− f

(1)

β

)= BT [f1 + τ (f2 − f1)] ,

as every point on the line connecting any two vectors f1 and f2 is a pointof the hypercuboid (9.70), all points on the line connecting the vectorsf

(1)

βand f

(2)

βare points of the (yet unknown) solid (9.67)

– point-symmetric, as for any vector f = −f there is a vector f β = −f β.

As we shall see, the objects in question are polygons in case of two mea-surands, polyhedra in case of three and abstract polytopes if there are morethan three measurands. Moreover, we suggest to add the prefix security, i.e.to speak of security polygons, security polyhedra and security polytopes. Ina sense, this naming would correspond to the terms confidence ellipses andconfidence ellipsoids.

Security Polygons

In the first instance, we shall confine ourselves to couplings between just twopropagated systematic errors, fβk

; k = 1, 2. Assuming m = 5, we have

fβ1= b11f1 + b21f2 + b31f3 + b41f4 + b51f5

fβ2= b12f1 + b22f2 + b32f3 + b42f4 + b52f5 , (9.72)

where

−fs,i ≤ fi ≤ fs,i ; i = 1, . . . , 5 . (9.73)

Let us refer to a rectangular fβ1, fβ2

coordinate system. According to (9.71),the geometrical figure in question will be enclosed within a rectangle

−fs,β1, . . . , fs,β1

; −fs,β2, . . . , fs,β2

.

Let b11 = 0. Eliminating f1, we arrive at a straight line,

fβ2=

b12

b11fβ1

+ c1 , (9.74)

4Imagine a straight line connecting any two points of a given solid. If everypoint on the line is a point of the solid, the latter is called convex. There will thenbe neither hollows inside the solid nor folds in its hull.


where

c1 = h12f2 + h13f3 + h14f4 + h15f5

and

h12 =1

b11(b11b22 − b12b21) , h13 =

1b11

(b11b32 − b12b31) ,

h14 =1

b11(b11b42 − b12b41) , h15 =

1b11

(b11b52 − b12b51) .

Any variation of c1 shifts the line (9.74) parallel to itself; c1 is maximal if

f2 = f∗2 = sign (h12) fs,2 , f3 = f∗

3 = sign (h13) fs,3 ,

f4 = f∗4 = sign (h14) fs,4 , f5 = f∗

5 = sign (h15) fs,5 . (9.75)

But then

cs,1 = h12f∗2 + h13f

∗3 + h14f

∗4 + h15f

∗5

and

−cs,1 ≤ c1 ≤ cs,1 . (9.76)

Of all the lines specified in (9.74), the lines

fβ2=

b12

b11fβ1

+ cs,1 , fβ2=

b12

b11fβ1

− cs,1 (9.77)

hold maximum distance. Shifting f1 within its interval −fs,1, . . . , fs,1, thecoordinates (fβ1

, fβ2), as defined through (9.77), slide along two of the edges

of the geometrical figure to be constructed. Obviously, the pertaining verticesfollow from (9.72) and (9.75),

fβ1= b11f1 + b21f

∗2 + b31f

∗3 + b41f

∗4 + b51f

∗5 ,

fβ2= b12f1 + b22f

∗2 + b32f

∗3 + b42f

∗4 + b52f

∗5 . (9.78)

We finally have found two of the polygon edges and the associated vertices.Elimination of the other variables f2, f3, . . . , fm (m = 5 here) reveals that

there will be m pairs of parallel edges at most and, in particular, that thefigure in question will be a convex polygon.

Should b11, for example vanish, it would not be possible to eliminated f1.Instead, (9.72) would yield two abscissas

fβ1= ± [ sign (b21) b21fs,2 + sign (b31) b31fs,3

+sign (b41) b41fs,4 + sign (b51) b51fs,5 ] , (9.79)

through each of which we may draw a vertical line parallel to the fβ2-axis.

Along these vertical lines the coordinate fβ2slides according to

fβ2= b12f1 ± [ sign (b21) b22fs,2 + sign (b31) b32fs,3

+sign (b41) b42fs,4 + sign (b51) b52fs,5 ] . (9.80)


Fig. 9.3. Convex, point-symmetric polygon, m = 4

Example

We shall find the polygon defined by

BT =[

1 2 3 −1−1 3 2 1

]and − 1 ≤ fi ≤ 1 ; i = 1, . . . , 4 .

From

fβ1= f1 + 2f2 + 3f3 − f4 ; fβ2

= −f1 + 3f2 + 2f3 + f4

we conclude that the polygon is bounded by a rectangle of size

−7 ≤ fβ1≤ 7 ; −7 ≤ fβ2

≤ 7 .

Furthermore, we see that there are, instead of four, just three pairs of straightlines, namely

(g1, g2) : fβ2= −fβ1

± 10 ,

(g3, g4) : fβ2=

32fβ1

± 152

,

(g5, g6) : fβ2=

23fβ1

± 5 .

Figure 9.3 depicts the polygon.


Remark : In least squares, treating r unknowns, we may happen to beinterested in the couplings between r′ < r estimators wanting to assign tothe remaining r−r′ any fixed values. Considering such a situation, we add tothe relations for fβ1

and fβ2some third relation for fβ3

. Let the new systembe given by

fβ1= f1 + 2f2 + 3f3 − f4

fβ2= −f1 + 3f2 + 2f3 + f4

fβ3= 2f1 − f2 + 3f3 + 3f4 ,

and let us look for the contour line fβ3= 0. Obviously, we have to consider

the so-enlarged system as a whole, which, of course, leads us to a result quitedifferent from that we got by treating the first two relations alone. Indeed,as Fig. 9.3 illustrates, the inner polygon depicting the new result encloses asmaller area than the previous outer one.

Security Polyhedra

We shall now investigate the geometrical properties of the solids establishedby three propagated systematic errors. Exemplifying the case m = 5, weconsider

fβ1= b11f1 + b21f2 + b31f3 + b41f4 + b51f5

fβ2= b12f1 + b22f2 + b32f3 + b42f4 + b52f5

fβ3= b13f1 + b23f2 + b33f3 + b43f4 + b53f5 , (9.81)

where

−fs,i ≤ fi ≤ fs,i ; i = 1, . . . , 5 (9.82)

is assumed. With respect to a rectangular fβ1, fβ2

, fβ3coordinate system, the

solid in question is enclosed within a cuboid as given by (9.71). Writing (9.81)in the form

b11f1 + b21f2 + b31f3 = fβ1− b41f4 − b51f5 = ξ

b12f1 + b22f2 + b32f3 = fβ2− b42f4 − b52f5 = η

b13f1 + b23f2 + b33f3 = fβ3− b43f4 − b53f5 = ζ (9.83)

we may solve for f3,

f3

∣∣∣∣∣∣∣b11 b21 b31

b12 b22 b32

b13 b23 b33

∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣b11 b21 ξ

b12 b22 η

b13 b23 ζ

∣∣∣∣∣∣∣ .

Defining


λ12 = b12b23 − b22b13 , µ12 = b21b13 − b11b23 , ν12 = b11b22 − b21b12

we have

λ12fβ1+ µ12fβ2

+ ν12fβ3= c12 , (9.84)

where

c12 = h12,3f3 + h12,4f4 + h12,5f5 (9.85)

and

h12,3 = (λ12b31 + µ12b32 + ν12b33)h12,4 = (λ12b41 + µ12b42 + ν12b43)h12,5 = (λ12b51 + µ12b52 + ν12b53) .

Solving (9.83) for f3, the variables f1 and f2 vanish. Varying c12, which ispossible through f3, f4, f5, the plane (9.84) gets shifted parallel to itself. Thechoices

f3 = f∗3 = sign (h12,3) fs,3

f4 = f∗4 = sign (h12,4) fs,4

f5 = f∗5 = sign (h12,5) fs,5

assign a maximum value to c12,

cs,12 = h12,3f∗3 + h12,4f

∗4 + h12,5f

∗5

so that

−cs,12 ≤ c12 ≤ cs,12 .

Finally, we have found two of the faces of the solid, namely

λ12fβ1+ µ12fβ2

+ ν12fβ3= −cs,12 (F1)

λ12fβ1+ µ12fβ2

+ ν12fβ3= cs,12 (F2) , (9.86)

where, for convenience, the faces have been designated by F1 and F2. Withinthe vector

f = (f1 f2 f3 f4 f5)T

,

the variables f1, f2 are free, while the variables f3, f4, f5 are fixed. If f1, f2

vary, the vector f β = BTf slides along the faces (9.86). Cyclically swappingthe variables, we successively find all the faces of the solid, which, as we nowsee, is a convex polyhedron. As Table 9.4 indicates, the maximum number offaces is given by m(m − 1). Table 9.5 visualizes the swapping procedure.

If in (9.81), e.g. b11 and b12 vanish, we cannot eliminate f1. Moreover, asν12 becomes zero, (9.86) represents a pair of planes parallel to the fβ3

-axis,

λ12fβ1+ µ12fβ2

= ±cs,12 , ν12 = 0 .

The cases λ12 = 0 and µ12 = 0 may be treated correspondingly.


Table 9.4. Maximum number of faces of the polyhedron

× f1f2 f1f3 f1f4 f1f5

× × f2f3 f2f4 f2f5

× × × f3f4 f3f5

× × × × f4f5

× × × × ×

Table 9.5. Cyclic elimination, r = 3 and m = 5

eliminated variables remaining

variable within the system

1st step f1 f2 ⇒ f3 f4 f5

2nd step f1 f3 ⇒ f4 f5 f2

3rd step f1 f4 ⇒ f5 f2 f3

4th step f1 f5 ⇒ f2 f3 f4







Vertices

The vertices on the faces (9.86) emerge as follows. Given h12,i = 0; i = 3, 4, 5,we consider, as f3, f4 and f5 are fixed, two sets of f -vectors referring to F1and F2, namely

[ −fs,1 −fs,2 −f∗3 −f∗

4 −f∗5 ]T

[ fs,1 −fs,2 −f∗3 −f∗

4 −f∗5 ]T

[ −fs,1 fs,2 −f∗3 −f∗

4 −f∗5 ]T

[ fs,1 fs,2 −f∗3 −f∗

4 −f∗5 ]T

(F1)

and

[ fs,1 fs,2 f∗3 f∗

4 f∗5 ]T

[ −fs,1 fs,2 f∗3 f∗

4 f∗5 ]T

[ fs,1 −fs,2 f∗3 f∗

4 f∗5 ]T

[ −fs,1 −fs,2 f∗3 f∗

4 f∗5 ]T

(F2)

On each of the planes, these vectors mark exactly four vertices via f β = BTf .If in (9.85), e.g. h12,5 vanishes, we would encounter eight vertices instead of


four on each of the faces (9.86). Should h12,4 also be zero, there would be 16vertices on each face. However, at least one of the coefficients h12,i, i = 3, 4, 5must be unequal to zero as otherwise the plane (9.84) would pass throughthe origin.

Contour Lines

Let us assume that none of the coefficients of the variables fβk; k = 1, 2, 3 in

(9.84) vanishes. Solving (9.86) for fβ2yields

fβ2= −λ12

µ12fβ1

− cs,12

µ12− ν12

µ12fβ3

(F1)

fβ2= −λ12

µ12fβ1

+cs,12

µ12− ν12

µ12fβ3

. (F2) (9.87)

Obviously, any fixed fβ3according to

fβ3= b13f1 + b23f2 − b33f

∗3 − b43f

∗4 − b53f

∗5 = const. (F1)

fβ3= b13f1 + b23f2 + b33f

∗3 + b43f

∗4 + b53f

∗5 = const. (F2) (9.88)

yields an fβ1, fβ2

contour line along the pertaining face of the polyhedron.For F2 we have

fβ3,2= |b13| fs,1 + |b23| fs,2 + b33f

∗3 + b43f

∗4 + b53f

∗5

fβ3,1= − |b13| fs,1 − |b23| fs,2 + b33f

∗3 + b43f

∗4 + b53f

∗5

so that the interval

fβ3,1≤ fβ3

≤ fβ3,2(F2) (9.89)

constitutes the range of admissible fβ3values for contour lines alongside F2.

Analogously, for F1 we find

−fβ3,2≤ fβ3

≤ −fβ3,1. (F1) (9.90)

We may repeat this procedure with each of the m(m − 1)/2 available pairsof parallel planes of the kind (9.86). Then, choosing any contour level fβ3

=const. within the interval

−fs,β3, . . . , fs,β3

as defined in (9.71), we combine all straight lines of the kind (9.87) whichpossess an fβ3

-value coping with this preset level. The search for the set ofstraight lines which actually constitutes a specific contour line fβ3

= const. isbased on intervals of the kind (9.89) and (9.90). We then find the coordinatesof the endpoints of the segments of the straight lines and string the segments


together thus building a polygon, namely that which is defined by the inter-section of a horizontal plane having height fβ3

= const. with the polyhedron.Stacking the contour lines on top of each other maps the polyhedron.

It might appear to be more advantageous to bring the polyhedron intobeing via its faces and not by its contour lines. On the other hand, we shouldbear in mind that the number of faces rises rapidly with m, thus provokingsituations which might appear to be scarcely visually controllable. In theopinion of the author, it therefore seems to be simpler to refer to contourlines. As they appear stacked, the polyhedron’s hull should be smoothed bymeans of a suitable image filter.5

Example

We shall find the surfaces and contour lines of the polyhedron

f β = BTf ; BT =

⎛⎝ 1 2 3 −1 2

−1 3 2 2 −12 −1 −3 3 −3

⎞⎠ ; −1 ≤ fi ≤ 1 ; i = 1, . . . , 4 .

As the linear system

fβ1= f1 + 2f2 + 3f3 − f4 + 2f5

fβ2= −f1 + 3f2 + 2f3 + 2f4 − f5

fβ3= 2f1 − f2 − 3f3 + 3f4 − 3f5

shows, the polyhedron is enclosed within the cuboid

−9 ≤ fβ1≤ 9 ; −9 ≤ fβ2

≤ 9 ; −12 ≤ fβ3≤ 12 .

Cyclic elimination, Table 9.6 yields the coefficients λ, µ, ν and the constants±cs for (9.86) and (9.87). Figure 9.4 depicts the polyhedron.

Table 9.6. Cyclic elimination, m = 5, r = 3

eliminatedvariable λ µ ν cs fβ3,1

fβ3,2

1 f1, f2 −5 −5 5 80 6 12

2 f1, f3 −1 −9 5 80 0 10

3 f1, f4 −7 5 1 76 2 12

4 f1, f5 5 −7 1 68 −6 4

5 f2, f3 −7 −3 −5 24 −6 2

6 f2, f4 11 5 7 38 −8 0

7 f2, f5 −10 −4 −8 38 −12 −4

8 f3, f4 12 6 8 42 −6 6

9 f3, f5 −9 −3 −7 34 −10 2

10 f4, f5 −3 −3 −3 24 −12 0

5I am grateful to my son Niels for designing and implementing the filter.


Fig. 9.4. Exemplary merging of a confidence ellipsoid and a security polyhedroninto an EPC hull

Security Polytopes

Polyhedra in more than three dimensions, called polytopes, may be rep-resented as a sequence of three-dimensional intersections each defining anfβ1

, fβ2, fβ3

polyhedron. Let, for example, r = 4. It then would be possible tovary one of the propagated systematic errors, say fβ4

, in discrete steps anddisplay the associated sequence of polyhedra. Thus, the couplings would bemapped by an ensemble of polyhedra.

9.5.3 EP Boundaries and EPC Hulls

In the following, confidence ellipsoids and security polytopes will be mergedinto overall uncertainty spaces.

For r = 2 we have to combine confidence ellipses and security polygons,for r = 3 confidence ellipsoids and security polyhedra and for r > 3 abstractr-dimensional ellipsoids and polytopes.

The boundary of a two-dimensional overall uncertainty region turns outto be convex. As we shall see, the boundary consists of the arc segments ofa contiguous decomposition of the circumference the Ellipse and, addition-ally, of the edges of the Polygon, i.e. arcs and edges alternately succeed oneanother. We therefore will call these lines EP boundaries. In a sense, theyresemble “convex slices of potatoes”.

The boundary of a three-dimensional overall uncertainty space is alsoconvex and comprises the segments of a contiguous decomposition of theEllipsoid’s “skin”, the faces of the Polyhedron and, finally, certain parts ofelliptical Cylinders providing smooth transitions between flat and curvedsections of the surface. So-established boundaries will be called EPC hulls.A comparison with “convex potatoes“ appears suggestive.


EP Boundaries

We refer to (9.62),

β1 = β0,1 +(β1 − µβ1

)+ fβ1

β2 = β0,2 +(β2 − µβ2

)+ fβ2

, (9.91)

and resort to Fig. 9.5 to tackle the task of designing a two-dimensional uncer-tainty region within which to find the true values β0,1, β0,2 of the estimatorsβ1, β2. With respect to the influence of random errors, we refer to the con-fidence ellipsoid (9.66), as the parameters µβ1

, µβ2happen to be unknown.

Because of

s−1β

=(BTs B

)−1=

1∣∣sβ

∣∣(

sβ2β2−sβ1β2

−sβ2β1sβ1β1

),

where |sβ| = sβ1β1sβ2β2

− s2β1β2

, we arrive at

sβ2β2

(β1 − β1

)2 − 2sβ1β2

(β1 − β1

) (β2 − β2

)+ sβ1β1

(β1 − β2

)2=∣∣sβ

∣∣ t2H(2, n)n

. (9.92)

To simplify the notation, we put

x =(β1 − β1

), y =

(β2 − β2

)a = sβ2β2

, b = −sβ1β2, c = sβ1β1

, d =∣∣sβ

∣∣ t2H(2, n)n

so that

ax2 + 2bxy + cy2 = d . (9.93)

Transforming the equation of the tangent to the ellipse (9.93) at the pointx = xE, y = yE into Hesse’s normal form, we may easily find that point of thesecurity polygon which maximizes the two-dimensional overall uncertaintyregion “outwardly”. The slope of the tangent at (xE, yE),

y′E = −axE + byE

bxE + cyE(9.94)

leads to the equation

y = yE + y′E(x − xE) (9.95)

of the tangent, the Hesse form of which is

−y′E

Hx +

1H

y +y′ExE − yE

H= 0 , (9.96)


Fig. 9.5. EP boundary localizing the true values β0,1, β0,2

where

H = −sign (y′ExE − yE)

√1 + y′

E2 .

Let the (ξP, ηP); P = 1, 2, . . . denote the vertices of the polygon with respectto a rectangular, body-fixed fβ1

, fβ2coordinate system. For a given point

(xE, yE), we successively insert the coordinates of the vertices (ξP, ηP); P =1, 2, . . . in (9.96) and select that vertex which produces the largest positivedistance according to

−y′E

H(xE + ξP) +

1H

(yE + ηP) +y′ExE − yE

H> 0 ; P = 1, 2, . . . . (9.97)

Obviously, the coordinates

xEP = xE + ξP ; yEP = yE + ηP (9.98)

define a point of the boundary we are looking for. Now, all we have to do is tomove (xE, yE) in discrete steps along the circumference of the ellipse and torestart procedure (9.97) thus bringing into being, bit by bit, the boundary ofthe two-dimensional uncertainty region sought after. According to the errormodel, this line localizes the true values β0,1, β0,2 “quasi-certainly”.

If the tangent runs parallel to one of the edges of the polygon, that edgeas a whole becomes part of the boundary. Consider two successive points(xE, yE), (xE′ , yE′), the tangents of which are parallel to polygon edges. Then,the arc segment lying in-between becomes, as shifted “outwardly”, part ofthe boundary. Hence, the circumference of the ellipse gets contiguously de-composed, and each of its segments becomes part of the boundary of thetwo-dimensional uncertainty region. Finally, this line is made up of Ellipticalarcs and Polygon edges. From this the term EP boundary has been derived.


Fig. 9.6. Exemplary merging of a confidence interval and a security polygon

Example

We shall merge the confidence ellipse

sβ =

(500 70

70 200

); tP (2, n)/

√n = 1

and the security polygon

BT =

(1 2 3 −1 5 −1 7 0 −3 8

−1 3 2 1 −1 2 0 2 7 −1

)−1 ≤ fi ≤ 1 ; i = 1, . . . , 10


into an overall uncertainty region or EP boundary.According to Sect. 3.6, the confidence ellipse is bounded by a rectangle

with sides√

sβ1β1=

√500 = ±22.4 ,

√sβ2β2

=√

200 = ±14.1 .

The rectangle which encloses the security polygon has dimensions

±31 and ± 20 .

But then the EP boundary is contained in a rectangle with sides

±53.4 and ± 34.1 .

Figure 9.6 illustrates the computer-aided putting together of the two figures.

EPC Hulls

Adding component β3 according to (9.62),

β1 = β0,1 +(β1 − µβ1

)+ fβ1

β2 = β0,2 +(β2 − µβ2

)+ fβ2

β3 = β0,3 +(β3 − µβ3

)+ fβ3

(9.99)

we may proceed as for two-dimensional uncertainty regions. Let us considerthe simplest case: We place the center of the security polyhedron at somepoint (xE, yE, zE) of the ellipsoid’s “skin” and shift the tangent plane at thispoint outwardly until it passes through that vertex of the polyhedron whichmaximizes the uncertainty space in question. The associate vertex (ξP, ηP, ζP)is found by trial and error. Finally, the coordinates

xEPZ = xE + ξP

yEPZ = yE + µP

zEPZ = zE + ζP (9.100)

provide a point of the EPC hull to be constructed.To begin with, we shall find a suitable representation of the tangent plane

to the confidence ellipsoid. For convenience, we denote the inverse of the em-pirical variance–covariance matrix entering (9.66) merely symbolically, i.e. in

t2P (3, n)n

=(β − β

)T (BTs B

)−1 (β − β

)we set

(BTs B

)−1=

⎛⎜⎝

γ11 γ12 γ13

γ21 γ22 γ23

γ31 γ32 γ33

⎞⎟⎠ .


Defining

x =(β1 − β1

), y =

(β2 − β2

), z =

(β3 − β3

), d = t2P (3, n) /n ,

the confidence ellipsoid turns into

γ11x2 + γ22y

2 + γ33z2 + 2γ12xy + 2γ23yz + 2γ31zx = d . (9.101)

We now intersect the ellipsoid by a set of sufficiently close-spaced planeszE = const., where

− t2P (r, n)n

√sβ3β3

≤ zE ≤ t2P (r, n)n

√sβ3β3

, (9.102)

and shift the center of the polyhedron in discrete steps alongside any partic-ular contour line zE = const. of the ellipsoid’s “skin”. During this slide, thespatial orientation of the polyhedron remains fixed. At each stop (xE, yE, zE)we construct the tangent plane

nx (x − xE) + ny (y − yE) + nz (z − zE) = 0 , (9.103)

where the nx, ny, ny denote the components of the normal

n =

⎛⎜⎝

nx

ny

ny

⎞⎟⎠ =

⎛⎜⎝

γ11xE + γ12yE + γ31zE



⎞⎟⎠ (9.104)

at (xE, yE, zE). By means of Hesse’s normal form

nx

Hx +

ny

Hy +

nz

Hz − (nxxE + nyyE + nzzE)

H= 0 , (9.105)

where

H = sign (nxxE + nyyE + nzzE)√

n2x + n2

y + n2z ,

we are in a position to select that vertex of the polyhedron which maximizesthe size of the three-dimensional uncertainty region “outwardly”. This is eas-ily done by successively inserting the vertices (ξP, ηP, ζP); P = 1, 2, . . . of thepolyhedron into (9.105), i.e. by testing the expressions

nx

H(xE + ξP) +

ny

H(yE + ηP) +

nz

H(zE + ζP)

− (nxxE + nyyE + nzzE)H

> 0 . P = 1, 2, . . . (9.106)

one after another. It goes without saying that the points (xE, yE, zE = const.)of a given contour line of the ellipsoid will not produce a contour line of


the three-dimensional uncertainty region in question. After all, this region isestablished through a dense system of curved, non-intersecting lines (9.100).To improve the graphical appearance of the stripy surface pattern we resortto a smoothing filtering algorithm6 and a suitable “illumination” procedure.

As can be shown, the construction procedure implies a contiguous de-composition of the Ellipsoid’s “skin”. These segments, shifted “outwardly”,become integral parts of the hull. The same applies to each of the faces of thePolyhedron. Finally, polyhedron edges when shifted appropriately alongsidethe ellipsoid’s skin, produce segments of elliptically curved Cylinders.

The itemized geometrical constituents, adjoining smoothly, have led tothe term EPC hull.

Example

We shall merge the confidence ellipsoid

sβ =

⎛⎝ 121 −70 −80

−70 100 50−80 50 81

⎞⎠ ; tP (3, n)/

√n = 1

and the security polyhedron

BT =

⎛⎜⎝ 1 2 3 −1 2

−1 3 2 2 −12 −1 −3 3 −3

⎞⎟⎠ ; −1 ≤ fi ≤ 1

into an EPC hull.The ellipsoid, polyhedron and EPC hull are enclosed within cuboids the

side lengths of which are given by

±11,±10,±9 ; ±9,±9,±12 and ± 20,±19,±21 ,

respectively.

6I am grateful to my son Niels for designing and implementing the filter.

Part III

Special Linear and Linearized Systems

10 Systems with Two Parameters

10.1 Straight Lines

On fitting of a straight line through m given data pairs we refer to threemetrologically different situations. In the simplest case, we assume, by def-inition, the abscissas to be error-free and the ordinates to be individual,erroneous measurements

(x0,1, y1) , (x0,2, y2) , . . . , (x0,m, ym) . (10.1)

Next, again considering error-free abscissas, we assume the ordinates to bearithmetic means

(x0,1, y1) , (x0,2, y2) , . . . , (x0,m, ym) . (10.2)

In general, however, both coordinates will exhibit the structure of arithmeticmeans

(x1, y1) , (x2, y2) , . . . , (xm, ym) . (10.3)

Let the latter be based on n repeat measurements xil, yil, l = 1, . . . , n,

xi =1n

n∑l=1

xil = x0,i + (xi − µxi) + fxi ; fs,xi ≤ fxi ≤ fs,xi

yi =1n

n∑l=1

yil = y0,i + (yi − µyi) + fyi ; fs,yi ≤ fyi ≤ fs,yi .

The data pairs (10.2) imply error bars

yi ± uyi ; i = 1, . . . , m ,

and the pairs (10.3) error rectangles

xi ± uxi , yi ± uyi ; i = 1, . . . , m .

Obviously, the data given in (10.1) should also exhibit error bars. However,the experimenter is not in a position to quote them when finishing his mea-surements. But, as will be shown, error bars turn out to be assessable af-terwards, namely when a straight line has been fitted through the measureddata.

152 10 Systems with Two Parameters

Given that the measurements are based on a linear physical model, thetrue values (x0,i, y0,i); i = 1, . . . , m of the coordinate pairs should establisha true straight line

y0(x) = β0,1 + β0,2x , (10.4)

where β0,1 and β0,2 denote the true y intercept and the true slope. A necessarycondition for this to occur comes from the error bars (and error rectangles).If the latter prove to be reliable, they will localize the true values (x0,i, y0,i);i = 1, . . . , m. Consequently, it should be possible to draw a straight line eitherthrough all error bars (and error rectangles) or, at least, through as many aspossible. Error bars (or rectangles) lying too far apart so that they cannotbe “hit” by the straight line do not fit the intersected ones. In the followingwe shall assume the linear model to be valid and, in the first step, assess theparameters

βk ± uβk; k = 1, 2 (10.5)

of the fitted straight line

y(x) = β1 + β2x . (10.6)

In the second step we shall find a so-called uncertainty band

y (x) ± uy(x) (10.7)

which we demand to enclose the true straight line (10.4).Let us formalize a least squares adjustment based on the data pairs (10.3).

The erroneous data give rise to the linear system

yi ≈ β1 + β2 xi ; i = 1, . . . , m . (10.8)

Introducing the vectors

y = (y1 y2 · · · ym)T , β = (β1 β2)T (10.9)

and the design matrix

A =

⎛⎜⎜⎝

1 x1

1 x2

· · · · · ·1 xm

⎞⎟⎟⎠ , (10.10)

(10.8) becomes

Aβ ≈ y . (10.11)

Obviously, the design matrix A includes erroneous quantities. Though thisaspect remains irrelevant with respect to the orthogonal projection of the

10.1 Straight Lines 153

vector y onto the column space of the matrix A, it will complicate the es-timation of measurement uncertainties. The orthogonal projection throughP = A(ATA)−1AT as defined in (7.13) turns (10.11) into

Aβ = P y (10.12)

so that

β = BTy ; B = A(ATA

)−1. (10.13)

Denoting

B = (bik) ; i = 1, . . . , m , k = 1, 2

the components β1, β2 of the solution vector β take the form

β1 =m∑

i=1

bi1yi , β2 =m∑

i=1

bi2yi . (10.14)

Finally, the least squares adjustment has confronted the true straight line(10.4) with the approximation formally introduced in (10.6). Let us presentthe quantities bik detail. The inversion of the matrix

ATA =

⎛⎜⎜⎜⎝

mm∑

j=1

xj

m∑j=1

xj

m∑j=1

x2j

⎞⎟⎟⎟⎠ (10.15)

yields

(ATA

)−1=

1D

⎛⎜⎜⎜⎝

m∑j=1

x2j −

m∑j=1

xj

−m∑

j=1

xj m

⎞⎟⎟⎟⎠ ; D = m

m∑j=1

x2j −⎡⎣ m∑

j=1

xj

⎤⎦2

(10.16)

so that

B =1D

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

m∑j=1

x2j − x1

m∑j=1

xj −m∑

j=1

xj + mx1

m∑j=1

x2j − x2

m∑j=1

xj −m∑

j=1

xj + mx2

· · · · · ·m∑

j=1

x2j − xm

m∑j=1

xj −m∑

j=1

xj + mxm

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, (10.17)


and, letting i = 1, . . . , m,

bi1 =1D

⎡⎣ m∑

j=1

x2j − xi

m∑j=1

xj

⎤⎦ ; bi2 =

1D

⎡⎣− m∑

j=1

xj + mxi

⎤⎦ . (10.18)

In the special case of error-free abscissas, we merely need to substitute thetrue values x0,j for the means xj ,

bi1 =1D

⎡⎣ m∑

j=1

x20,j − x0,i

m∑j=1

x0,j

⎤⎦ ; bi2 =

1D

⎡⎣− m∑

j=1

x0,j + mx0,i

⎤⎦ (10.19)

where

D = m

m∑j=1

x20,j −

⎡⎣ m∑

j=1

x0,j

⎤⎦2

.

10.1.1 Error-free Abscissas, Erroneous Ordinates

Though in most cases both the abscissas and the ordinates will be affected bymeasurement errors, the standard literature confines itself to the case of error-free abscissas. To begin with, we will follow this tradition, the subsequentsection will, however, deal with the more general situation.

No Repeat Measurements

In the simplest case there is just one measured ordinate for each abscissa.Let us further assume that to each of the ordinates we may attribute one andthe same systematic error, otherwise, under the present conditions, we wouldnot be in a position to estimate uncertainties, see Sect. 8.2.

Summarizing, our premises are as follows: We assume

– the true values (x0,i, y0,i); i = 1, . . . , m to satisfy the linear model (10.4),– the abscissas to be error-free,– the random errors, superposed on the true values of the ordinates to be

independent and to stem from one and the same parent distribution,– each ordinate to carry one and the same systematic error fyi = fy; i =

1, . . . , m.

The input data

y = (y1 y2 . . . ym)T

yi = y0,i + εi + fy = y0,i + (yi − µyi) + fy ; −fs,y ≤ fy ≤ fs,y (10.20)

and the design matrix


A =

⎛⎜⎜⎜⎝

1 x0,1

1 x0,2

· · · · · ·1 x0,m

⎞⎟⎟⎟⎠ (10.21)

define the inconsistent system

Aβ ≈ y (10.22)

to be submitted to least squares. The coefficients bi1 and bi2 of the estimators

β1 =m∑

i=1

bi1yi and β2 =m∑

i=1

bi 2yi (10.23)

have been quoted in (10.19). For convenience, let us collect the quantitiesdefining the true linear system

Aβ0 = y0 , β0 = (β0,1 β0,2)T

, y0 = (y0,1 y0,2 . . . y0,m)T . (10.24)

Here, β0 designates the true solution vector and y0 the vector of the truevalues of the observations.

Next we prove, given the above quoted conditions apply, the term fTf −fTPf appearing in (8.17) vanishes. To this end we refer to an auxiliaryvector γ0. As on each of the ordinates one and the same systematic error fy

is superimposed, on account of

Aγ0 = y0 + f ; f = fy (1 1 . . . 1)T (10.25)

γ0 has the components

γ0 =

(γ0,1

γ0,2

)=

(β0,1 + fy

β0,2

)(10.26)

as, evidently, the slope β0,2 is not subject to change. Combining (10.24) and(10.25) yields

Aγ0 = Aβ0 + A

(fy

0

)= Aβ0 + f ,

i.e.

A

(fy

0

)= f , Pf = f ⇒ fTf − fTPf = 0 (10.27)

with P = A(ATA)−1AT. Consequently, we may refer to (8.11) and estimatethe unknown theoretical variance


σ2 = E(Yi − µyi)

2

; µyi = E Yi ; i = 1, . . . , m

of the input data1 through

s2 =Qunweighted

m − 2(10.28)

so that, as shown in Sect. 8.2,

(m − 2) s2

σ2=

Qunweighted

σ2= χ2 (m − 2) . (10.29)

Finally, we are in a position to assign uncertainties to the individual mea-surements yi, i = 1, . . . , m. Referring to (3.49), Student’s

T =(Yi − µyi)/σ

S/σ=

(Yi − µyi)/σ√[χ2(m − 2)] /(m − 2)

; i = 1, . . . , m

enables us to define the confidence intervals

yi − tP (m − 2)s ≤ µyi ≤ yi + tP (m − 2)s ; i = 1, . . . , m

for the expectations µyi = EYi. We then associate overall uncertainties ofthe kind

yi ± uyi ; uyi = tP (m − 2)s + fs,y ; i = 1, . . . , m (10.30)

with the individual measurements yi; i = 1, . . . , m .

Uncertainties of the Components of the Solution Vector

Though, at present, there are no repeat measurements, we shall still denoteestimators by a bar on top. Inserting (10.20) into (10.14) yields

βk =m∑

i=1

bik [y0,i + (yi − µyi) + fy]

= β0,k +m∑

i=1

bik (yi − µyi) + fy

m∑i=1

bik ; k = 1, 2 . (10.31)

The respective sums of the coefficients bi1 and bi2, as defined in (10.19), aregiven by

1If, in contrast to our premise, the systematic errors vary along the sequence ofmeasured ordinates, or should the fitted line deviate too much at its left and right“ends” from the empirical data pairs implying that the latter do not fulfil the linearmodel (10.4), the estimator (10.28) will, of course, break down.


m∑i=1

bi1 = 1 andm∑

i=1

bi2 = 0 (10.32)

so that the third term on the right-hand side of (10.31) gives the propagatedsystematic errors

fβ1= fy and fβ2

= 0 . (10.33)

The empirical variance–covariance matrix of the solution vector

sβ =

(sβ1β1

sβ1β2

sβ2β1sβ2β2

)(10.34)

can only be stated in analogy to the way the conventional error calculusproceeds, i.e. we firstly define a theoretical variance–covariance matrix andsubsequently substitute empirical quantities for the inaccessible theoreticalquantities. Denoting the expectations of the components β1, β2 of the solutionvector as

µβ1=

m∑i=1

bi1µyi , µβ2=

m∑i=1

bi2µyi ; µyi = E Yi (10.35)

the theoretical variances and the theoretical covariance turn out to be

σ2β1

≡ σβ1β1= E

(β1 − µβ1

)2

= E

⎧⎨⎩[

m∑i=1

bi1 (Yi − µyi)

]2⎫⎬⎭ = σ2

m∑i=1

b2i1

σ2β2

≡ σβ2β2= σ2

m∑i=1

b2i2 (10.36)

and

σβ1β2= σ2

m∑i=1

bi1bi2 (10.37)

respectively. Obviously, their empirical counterparts are

sβ1β1= s2

β1= s2

m∑i=1

b2i1 , sβ2β2

= s2β2

= s2m∑

i=1

b2i2 ,

sβ1β2= s2

m∑i=1

bi1bi2 . (10.38)

According to (10.29) and (3.49), Student’s t pertaining to β1, is given by


T (m − 2) =β1 − µβ1

S

√m∑

i=1

b2i1

(10.39)

=

⎛⎜⎜⎜⎜⎝

β1 − µβ1

σ

√m∑

i=1

b2i1

⎞⎟⎟⎟⎟⎠

⎛⎜⎜⎜⎜⎝

S

√m∑

i=1

b2i1

σ

√m∑

i=1

b2i1

⎞⎟⎟⎟⎟⎠

−1

=

(β1 − µβ1

)/σ

√m∑

i=1

b2i1√

χ2(m−2)m−2

,

where, as usual, S denotes the random variable formally associated with theempirical standard deviation s. Finally, we assign the uncertainties

uβ1= tP (m − 2)s

√√√√ m∑i=1

b2i1 + fs,y ; uβ2

= tP (m − 2)s

√√√√ m∑i=1

b2i2 (10.40)

to the components β1, β2 of the solution vector β.

Uncertainty Band

Inserting (10.31) into (10.6) yields

y(x) = β1 + β2x

= (β0,1 + β0,2x) +m∑

i=1

(bi1 + bi2x) (yi − µyi) + fy . (10.41)

For any fixed x, we subtract the expectation

µy(x) = EY (x)

= (β0,1 + β0,2x) + fy (10.42)

from (10.41) so that

σ2y(x) = E

(Y (x) − µy(x)

)2 = σ2m∑

i=1

(bi1 + bi2x)2 . (10.43)

Obviously, an estimator for σ2y(x) is

s2y(x) = s2

m∑i=1

(bi1 + bi2x)2 . (10.44)

Referring to

T (m − 2) =Y (x) − µy(x)

S

√m∑

i=1

(bi1 + bi2x)2, (10.45)


the uncertainty band in question turns out to be

y(x) ± uy(x) ; uy(x) = tP (m − 2)s

√√√√ m∑i=1

(bi1 + bi2x)2 + fs,y . (10.46)

Repeat Measurements

Repeat measurements are optional, given that on each of the ordinates oneand the same systematic error is superimposed, however, they are indispens-able if the ordinates are biased by different systematic errors. The compo-nents β1 and β2 of the solution vector β are given in (10.14), the associatedcoefficients bi1 and bi2 in (10.19).


We firstly quote the empirical variance–covariance matrix of the solutionvector and the systematic errors of its components. With a view to Fig. 10.1and the error equations

yi = y0,i + (yi − µyi) + fyi ; −fs,yi ≤ fyi ≤ fs,yi ; i = 1, . . . , m ,

we “expand” the estimators βk, k = 1, 2 throughout a neighborhood of thepoint (y0,1, . . . , y0,2),

βk (y1, . . . , y2) = βk (y0,1, . . . , y0,2) +m∑

i=1

∂βk

∂y0,1(yi − y0,i) ; k = 1, 2 .

To abbreviate we set

βk (y0,1, . . . , y0,2) = β0,k and∂βk

∂y0,i= bik .

Fig. 10.1. Mean value yi, true value y0,i, systematic error fyi and expectationµyi = EYi of the i-th measured ordinate


On the other hand, we could have directly inserted the error equations for yi

into (10.14) to find

βk (y1, . . . , y2) = β0,k +m∑

i=1

bik (yi − µyi) +m∑

i=1

bikfyi ; k = 1, 2 . (10.47)

To assess the second term on the right-hand side, which expresses randomerrors, we refer to the individual measurements yil via

yi =1n

n∑l=1

yil ; i = 1, . . . , m .


βk =m∑

i=1

bik

[1n

n∑l=1

yil

]=

1n

n∑l=1

[m∑

i=1

bikyil

]=

1n

n∑l=1

βkl ; k = 1, 2(10.48)

where

βkl =m∑

i=1

bikyil . (10.49)

From (10.14) and (10.49) we obtain

βkl − βk =m∑

i=1

bik (yil − yi) . (10.50)

Thus, the elements of the empirical variance–covariance matrix of the solutionvector take the form

sβkβk′ =1

n − 1

n∑l=1

(βkl − βk

) (βk′l − βk′

)

=1

n − 1

n∑l=1

[m∑

i=1

bik (yil − yi)

]⎡⎣ m∑j=1

bjk′ (yjl − yj)

⎤⎦

=m∑

i,j=1

bikbjk′sij ; k, k′ = 1, 2 (10.51)

in which we recognize the elements

sij =1

n − 1

n∑l=1

(yil − yi) (yjl − yj) ; i, j = 1, . . . , m (10.52)

of the empirical variance–covariance matrix s of the input data. Finally, wedenote


sβ =

(sβ1β1

sβ1β2

sβ2β1sβ2β2

)= BTs B ; sβkβk

≡ s2βk

. (10.53)

According to (10.48) and (10.49), the βk are the arithmetic means of n in-dependent, normally distributed quantities βkl. Consequently, we may defineStudent’s confidence intervals

βk − tP (n − 1)√n

sβk≤ µβk

≤ βk +tP (n − 1)√

nsβk

; k = 1, 2 (10.54)

for the expectations Eβk = µβk; k = 1, 2. The propagated systematic errors

follow from (10.47). Their worst case estimations turn out to be

fs,βk=

m∑i=1

|bik| fs,yi ; k = 1, 2 . (10.55)

Finally, the overall uncertainties uβkof the estimators βk ± uβk

, k = 1, 2 aregiven by

uβk=

tP (n − 1)√n

sβk+ fs,βk

=tP (n − 1)√

n

√√√√ m∑i,j=1

bikbjksij +m∑

i=1

|bik| fs,yi ; k = 1, 2 . (10.56)

Equal Systematic Errors

Let fyi = fy, fs,yi = fs,y; i = 1, . . . , m. From (10.47) we then deduce

fβk=

m∑i=1

bikfyi = fy

m∑i=1

bik . (10.57)

Because of (10.32) we have

fβ1= fy ; fβ2

= 0 (10.58)

so that

uβ1=

tP (n − 1)√n

√√√√ m∑i,j=1

bi1bj1sij + fs,y

uβ2=

tP (n − 1)√n

√√√√ m∑i,j=1

bi2bj2sij . (10.59)


Uncertainty Band

As disclosed through (10.14) and (10.56), it clearly is not only a single straightline which is compatible with the input data and the error model. Rather,in addition to (10.6), arbitrary many least squares lines would apply justas well. Speaking vividly, what we expect is something like the “bow net”shaped uncertainty band similar to (10.46). Inserting (10.47),

βk (y1, . . . , y2) = β0,k +m∑

i=1


i=1

bikfyi ; k = 1, 2

into (10.6),

y(x) = β1 + β2x ,

yields

y(x) = β0,1 + β0,2x +m∑

i=1

(bi1 + bi2x) (yi − µyi) +m∑

i=1

(bi1 + bi2x) fyi .

(10.60)

We first estimate the contribution due to random errors. To this end, weinsert (10.48),

βk =1n

n∑l=1

βkl ; k = 1, 2 ,

into (10.6)

y(x) =1n

n∑l=1

β1l +1n

n∑l=1

β2lx

=1n

n∑l=1

(β1l + β2lx

)=

1n

n∑l=1

yl(x) . (10.61)

Obviously,

yl(x) = β1l + β2lx ; l = 1, . . . , n (10.62)

designates a fitted line which we would get if at every i-th measuring point weinserted just the l-th measured value yil. Referring to (10.49) and to (10.14),we have

βkl =m∑

i=1

bikyil , l = 1, . . . , n and βk =m∑

i=1

bikyi .


Fig. 10.2. Measuring points, measurement uncertainties, least squares fit and “bownet” shaped uncertainty band

Then, for any fixed x, the difference

yl(x) − y(x) =(β1l − β1

)+(β2l − β2

)x

=m∑

i=1

(bi1 + bi2x) (yil − yi) (10.63)

gives rise to an empirical variance

s2y(x) =

1n − 1

n∑l=1

[yl(x) − y(x)]2 = bTsb , (10.64)

where for convenience the auxiliary vector

b = (b11 + b12x b21 + b22x . . . bm1 + bm2x)T (10.65)

has been introduced. According to (10.60), the propagated systematic erroris given by

fy(x) =m∑

i=1

(bi1 + bi2x) fyi , (10.66)

and its worst case estimate is

fs,y(x) =m∑

i=1

|bi1 + bi2x| fs,yi . (10.67)

As shown in Fig. 10.2, for any fixed x, the uncertainty uy(x) of the leastsquares fit (10.6),


y(x) = β1 + β2x ,

takes the form

uy(x) =tP (n − 1)√

nsy(x) + fs,y(x)

=tP (n − 1)√

n

√bTs b +

m∑i=1

|bi1 + bi2x| fs,yi . (10.68)

Hence, the true straight line (10.4),

y0(x) = β0,1 + β0,2x ,

should lie within the uncertainty band

y(x) ± uy(x) . (10.69)


Let fyi = fy, fs,yi = fs,y; i = 1, . . . , m. Because of (10.32), the relationship(10.66) turns into

fy(x) = fy

m∑i=1

(bi1 + bi2x) = fy , (10.70)

i.e.

uy(x) =tP (n − 1)√

nsy(x) + fs,y =

tP (n − 1)√n

√bTs b + fs,y . (10.71)

To check this, we compare (10.56) with (10.68) for the special case y(0) = β1.The uncertainty uβ1

should be equal to the uncertainty uy(x) in x = 0. Ifx = 0, the vector b as defined in (10.65) turns out to be

b1 = (b11 b21 . . . bm1)T ;

hence

s2y(0) = sβ1β1

≡ s2β1

(10.72)

so that

uy(0) =tP (n − 1)√

nsβ1

+m∑

i=1

|bi1| fs,yi (10.73)

which agrees with (10.56).


Fig. 10.3. Measuring points (xi, yi), expectations (µxi , µyi) and true values(x0,i, y0,i)

10.1.2 Erroneous Abscissas, Erroneous Ordinates

It would be more realistic to consider both coordinates, the abscissas and theordinates, to be affected by measurement errors, Fig. 10.3.


Let us expand the elements bik of the matrix B as defined in (10.17). Indeed,under the conditions considered, these elements are erroneous so that, incontrast to (10.47), now there is no other choice,

βk (x1, . . . , ym) = βk (x0,1, . . . , y0,m) (10.74)

+m∑

i=1

∂βk

∂x0,i(xi − x0,i) +

m∑i=1

∂βk

∂y0,i(yi − y0,i) + · · ·

Linearizing the expansion, approximating the derivatives in (x0,1, . . . , y0,m)through derivatives in (x1, . . . , ym) and inserting the error equations

xi = x0,i + (xi − µxi) + fxi ; −fs,xi ≤ fxi ≤ fs,xi

yi = y0,i + (yi − µyi) + fyi ; −fs,yi ≤ fyi ≤ fs,yi

yields, as βk(x0,1, . . . , y0,m) = β0,k,

β1 (x1, . . . , ym) = β0,1 +m∑

i=1

∂β1

∂xi(xi − µxi) +

m∑i=1

∂β1

∂yi(yi − µyi)

+m∑

i=1

∂β1

∂xifxi +

m∑i=1

∂β1

∂yifyi (10.75)

and

β2 (x1, . . . , ym) = β0,2 +m∑

i=1

∂β2

∂xi(xi − µxi) +

m∑i=1

∂β2

∂yi(yi − µyi)

+m∑

i=1

∂β2

∂xifxi +

m∑i=1

∂β2

∂yifyi . (10.76)


To establish the partial derivatives, we insert (10.18) into (10.14),

β1 =m∑

i=1

bi1yi

=1D

m∑i=1

⎡⎣ m∑

j=1

x2j − xi

m∑j=1

xj

⎤⎦ yi =

1D

⎡⎣ m∑

j=1

yj

m∑j=1

x2j −

m∑j=1

xj yj

m∑j=1

xj

⎤⎦

so that

∂β1

∂xi=

1D

⎡⎣2xi

m∑j=1

yj − yi

m∑j=1

xj −m∑

j=1

xj yj

⎤⎦− 2β1

D

⎡⎣mxi −

m∑j=1

xj

⎤⎦ (10.77)

and

∂β1

∂yi= bi1 =

1D

⎡⎣ m∑

j=1

x2j − xi

m∑j=1

xj

⎤⎦ . (10.78)

Similarly, from

β2 =m∑

i=1

bi2yi

=1D

m∑i=1

⎡⎣− m∑

j=1

xj + mxi

⎤⎦ yi =

1D

⎡⎣− m∑

j=1

xj

m∑j=1

yj + m

m∑j=1

xj yj

⎤⎦

we obtain

∂β2

∂xi=

1D

⎡⎣− m∑

j=1

yj + myi

⎤⎦− 2β2

D

⎡⎣mxi −

m∑j=1

xj

⎤⎦ (10.79)

and

∂β2

∂yi= bi2 =

1D

⎡⎣− m∑

j=1

xj + mxi

⎤⎦ . (10.80)

In (10.75) and (10.76), random and systematic errors are each expressedthrough two terms. Let us repeat the expansions of β1 and β2, this time,however, throughout a neighborhood of the point

(x1, . . . , xm; y1, . . . , ym)

and with respect to the l-th repeat measurements of the measuring pointsi = 1, . . . , m, i.e. with respect to the data set


(x1l, . . . , xml; y1l, . . . , yml) ; l = 1, . . . , n .

We then find

β1l (x1l, . . . , xml; y1l, . . . , yml) = β1 (x1, . . . , xm; y1, . . . , ym)

+m∑

i=1

∂β1

∂xi(xil − xi) +

m∑i=1

∂β1

∂yi(yil − yi) + · · · (10.81)

and

β2l (x1l, . . . , xml; y1l, . . . , yml) = β2 (x1, . . . , xm; y1, . . . , ym)

+m∑

i=1

∂β2

∂xi(xil − xi) +

m∑i=1

∂β2

∂yi(yil − yi) + · · · . (10.82)

Obviously, the linearizations imply

βk =1n

n∑l=1

βkl ; k = 1, 2 .

For convenience, we simplify the nomenclature by introducing the definitions

vil = xil ,

vi = xi ,

µi = µxi ,

fi = fxi ,

fs,i = fs,xi ,

vi+m,l = yil

vi+m = yi

µi+m = µyi

fi+m = fyi

fs,i+m = fs,yi

(10.83)

and

ci1 =∂β1

∂xi, ci+m,1 =

∂β1

∂yi

ci2 =∂β2

∂xi, ci+m,2 =

∂β2

∂yi. (10.84)

Applying this notation, the expansions (10.75), (10.76), (10.81) and (10.82)turn into

βk = β0,k +2m∑i=1

cik (vi − µi) +2m∑i=1

cikfi ; k = 1, 2 (10.85)

and

βkl − βk =2m∑i=1

cik (vil − vi) ; k = 1, 2 . (10.86)


From (10.86) we deduce the elements

sβkβk′ =1

n − 1

n∑l=1

(βkl − βk

) (βk′l − βk′

)

=1

n − 1

n∑l=1

[2m∑i=1

cik (vil − vi)

]⎡⎣ 2m∑j=1

cjk′ (vjl − vj)

⎤⎦

=2m∑

i,j=1

cikcjk′sij (10.87)

of the empirical variance–covariance matrix sβ of the solution vector, in whichthe

sij =1

n − 1

n∑l=1

(vil − vi) (vjl − vj) ; i, j = 1, . . . , 2m

designate the elements of the (expanded) empirical variance–covariance ma-trix

s = (sij)

of the input data. We condense the coefficients ci1 and ci2 into the auxiliarymatrix

CT =

(c11 c21 · · · c2m,1

c12 c22 · · · c2m,2

)

and arrive, in analogy to (10.53), at

sβ =

(sβ1β1

sβ1β2

sβ2β1sβ2β2

)= CTsC ; sβkβk

≡ s2βk

. (10.88)

Finally, in view of (10.85) and (10.87), we may assign the uncertainties

uβk=

tP (n − 1)√n

√√√√ 2m∑i,j=1

cikcjksij +2m∑i=1

|cik| fs,i ; k = 1, 2 (10.89)

to the estimators βk ± uβk, k = 1, 2.


Let fxi = fx, fs,xi = fs,x; fyi = fy, fs,yi = fs,y; i = 1, . . . , m. From (10.75)we obtain


m∑i=1

∂β1

∂xifxi +

m∑i=1

∂β1

∂yifyi = fx

m∑i=1

ci1 + fy

m∑i=1

ci+m,1 = −fxβ2 + fy

and from (10.76)

m∑i=1

∂β2

∂xifxi +

m∑i=1

∂β2

∂yifyi = fx

m∑i=1

ci2 + fy

m∑i=1

ci+m,2 = 0 .


uβ1=

tP (n − 1)√n

√√√√ 2m∑i,j=1

ci1cj1sij + fs,x

∣∣β2

∣∣+ fs,y (10.90)

and

uβ2=

tP (n − 1)√n

√√√√ 2m∑i,j=1

ci2cj2sij . (10.91)

Uncertainty Band

Inserting (10.85) into the fitted line y(x) = β1 + β2x, we obtain

y(x) = β0,1 + β0,2x +2m∑i=1

(ci1 + ci2x) (vi − µi)

+2m∑i=1

(ci1 + ci2x) fi . (10.92)

To estimate the contribution due to random errors we refer to the n leastsquares lines of the repeat measurements,

yl(x) = β1l + β2lx ; l = 1, . . . , n .

Upon insertion of (10.86), the difference

yl(x) − y(x) =(β1l − β1

)+(β2l − β2

)x ; l = 1, . . . , n (10.93)

changes into

yl(x) − y(x) =2m∑i=1

(ci1 + ci2x) (vil − vi) ; l = 1, . . . , n (10.94)

so that

s2y(x) = cTsc (10.95)


Fig. 10.4. Measuring points, uncertainty rectangles, fitted straight line, “bow net”shaped uncertainty band and true straight line

where for convenience the auxiliary vector

c = (c11 + c12x c21 + c22x · · · c2m,1 + c2m,2x)T

has been introduced. As (10.92) shows, the propagated systematic error isgiven by

fy(x) =2m∑i=1

(ci1 + ci2x) fi . (10.96)

Its worst case estimation yields

fs,y(x) =2m∑i=1

|ci1 + ci2x| fs,i . (10.97)

As illustrated in Fig. 10.4, the “bow net” shaped uncertainty band takes theform

y(x) ± uy(x) (10.98)

uy(x) =tP (n − 1)√

nsy(x) + fs,y(x) =

tP (n − 1)√n

√cTsc +

2m∑i=1

|ci1 + ci2x| fs,i .


Let fxi = fx, fs,xi = fs,x; fyi = fy, fs,yi = fs,y; i = 1, . . . , m. Then, from(10.96) we deduce


fy(x) = −fxβ2 + fy , (10.99)

the worst case estimation is

fs,y(x) = fs,x

∣∣β2

∣∣+ fs,y . (10.100)

The modified overall uncertainty is given by

uy(x) =tP (n − 1)√

n

√cTsc + fs,x

∣∣β2

∣∣+ fs,y . (10.101)

Again, we may convince ourselves that uy(0) = uβ1.

Example

Measurement of the linear coefficient of expansion. The length L(t) of a rodis to be measured with respect to the temperature t. Let α denote the linearcoefficient of expansion. Then we have

L(t) = Lr [1 + α (t − tr)] = Lr + αLr (t − tr) .

Here, Lr denotes the length of the rod at some reference temperature tr.Setting x = t − tr, y = L(t), β1 = Lr, and β2 = αLr, we obtain

y(x) = β1 + β2x .

The uncertainties uβ1, uβ2

have already been quoted in (10.89). Consequently,we may confine ourselves to assessing the uncertainty uα of the coefficient

α = φ(β1, β2

)= β2/β1 ,

see Sect. 9.4. Inserting (10.85),

βk = β0,k +2m∑i=1

cik (vi − µi) +2m∑i=1

cikfi ; k = 1, 2

into the expansion

φ(β1, β2

)= φ(β0,1, β0,2

)+

∂φ

∂β0,1

(β1 − β0,1

)+

∂φ

∂β0,2

(β2 − β0,2

)+ · · ·

yields

φ(β1, β2

)= φ(β0,1, β0,2

)+

2m∑i=1

(∂φ

∂β1ci1 +

∂φ

∂β2ci2

)(vi − µi)

+2m∑i=1

(∂φ

∂β1ci1 +

∂φ

∂β2ci2

)fi ,


where the partial derivatives are given by

∂φ

∂β1≈ − α

β1;

∂φ

∂β2≈ 1

β1.

Further expansion with respect to β1l and β2l leads to

φ(β1l, β2l

)= φ(β1, β2

)+

∂φ

∂β1

(β1l − β1

)+

∂φ

∂β2

(β2l − β2

)so that

s2φ =

1n − 1

n∑l=1

[φ(β1l, β2l

)− φ(β1, β2

)]2=(

∂φ

∂β1

)2

sβ1β1+ 2(

∂φ

∂β1

∂φ

∂β2

)sβ1β2

+(

∂φ

∂β2

)2

sβ2β2.

Hence, the uncertainty uα of the result α ± uα follows from

uα =tP (n − 1)√

n

√(∂φ

∂β1

)2

sβ1β1+ 2(

∂φ

∂β1

∂φ

∂β2

)sβ1β2

+(

∂φ

∂β2

)2

sβ2β2

+2m∑i=1

∣∣∣∣ ∂φ

∂β1ci1 +

∂φ

∂β2ci2

∣∣∣∣ fs,i .

10.2 Linearization

Let φ(x, y; a, b) = 0 be some non-linear function whose parameters a and bare to be estimated by means of m measured data pairs (xi, yi); i = 1, . . . , m.In order to linearize the function through series expansion and subsequenttruncation, we have to assume that estimates or starting values a∗, b∗ exist.The series expansion yields

φ(x, y; a, b) = φ(x, y; a∗, b∗) (10.102)

+∂φ(x, y; a∗, b∗)

∂a∗ (a − a∗) +∂φ (x, y; a∗, b∗)

∂b∗(b − b∗) + · · ·

Clearly, the closer the estimates a∗, b∗ to the true values a0, b0, the smallerthe linearization errors. Considering the differences

β1 = a − a∗ and β2 = b − b∗ (10.103)

as unknowns and setting

ui =

∂φ (xi, yi ; a∗, b∗)∂b∗

∂φ (xi, yi; a∗, b∗)∂a∗

, vi = − φ (xi, yi; a∗, b∗)∂φ (xi, yi; a∗, b∗)

∂a∗

, (10.104)

10.3 Transformation 173

relation (10.102) turns into

β1 + β2ui ≈ vi ; i = 1, . . . , m . (10.105)

The least squares estimators β1 and β2 lead to the quantities

a = a∗ + β1 and b = b∗ + β2 (10.106)

which, given the procedure converges, will be closer to the true values a0, b0

than the starting values a∗, b∗. In this case we shall cyclically repeat theadjustment until

β1 ≈ 0 and β2 ≈ 0 . (10.107)

If the input data were error free, the cycles would reduce the estimators β1, β2

down to the numerical uncertainty of the computer. However, as the inputdata are erroneous, after a certain number of cycles, the numerical values ofthe estimators β1, β2 will start to oscillate around some residuals which aredifferent from zero and reflect the degree of accuracy of the input data.

The uncertainties of the estimators β1, β2 pass to the uncertainties of theparameters a and b according to

uβ1= ua ; uβ2

= ub (10.108)

as the quantities a∗, b∗ enter error-free. The estimation of the uncertaintiesuβ1

, uβ2has been explained in Sect. 10.1.

10.3 Transformation

Assume that a given non-linear relationship may be linearized by means ofcoordinate transformations. Then, initially, there will be no linearization er-rors which otherwise would be superimposed on the measurement errors.On the other hand, when transferring the uncertainties of the estimators ofthe linearized model to the estimators of the primary, non-linear model, wehave to linearize as well. In a sense, which of the two approaches – lineariza-tion or transformation of a given non-linear function – turns out to be morefavourable, seems to depend on the strategy aimed at.

Example

Given m measured data pairs (xi, yi); i = 1, . . . , m and a non-linear relation-ship φ(x, y; a, b) = 0, we shall estimate the parameters a and b. Let

φ(x, y; a, b) = y − axb . (10.109)

Taking logarithms yields


ln y = ln a + b ln x . (10.110)

Setting v = ln y, u = ln x, β1 = ln a, and β2 = b changes (10.110) into

β1 + β2u = v . (10.111)

From (10.14) and (10.89), we find the parameters and their uncertainties, say

βk ± uβk, k = 1, 2 . (10.112)

To estimate the uncertainty ua of the parameter a, we linearize

a = eβ1 (10.113)

with respect to the points a and a0,

al = a + eβ1(β1,l − β1

); l = 1, . . . , n ,

a = a0 + eβ1(β1 − β0,1

). (10.114)

From the first expansion we obtain the uncertainty due to random errors andfrom the second one the uncertainty due to propagated systematic errors, sothat

a ± ua ; ua = eβ1uβ1. (10.115)

Finally, as b = β2, we have

b ± ub ; ub = uβ2. (10.116)

11 Systems with Three Parameters

11.1 Planes

The adjustment of three-parametric relationships calls for coordinate triples

(xi, yi, zi) ; i =, . . . , m .

As usual, we assign to each of the measurands xi, yi, zi an error equation

xi =1n

n∑l=1

xil = x0,i + (xi − µxi) + fxi ; fs,xi ≤ fxi ≤ fs,xi

yi =1n

n∑l=1

yil = y0,i + (yi − µyi) + fyi ; fs,yi ≤ fyi ≤ fs,yi

zi =1n

n∑l=1

zil = z0,i + (zi − µzi) + fzi ; fs,zi ≤ fzi ≤ fs,zi .

Let us now consider the least squares adjustment of planes, see Fig. 11.1.

Solution Vector

Inserting m > 4 erroneous coordinate triples into the equation of the plane

z = β1 + β2x + β3y (11.1)

yields an inconsistent linear system of the kind

β1 + β2xi + β3yi ≈ zi ; i = 1, . . . , m . (11.2)

For convenience, we define column vectors

z = (z1 z2 . . . z3)T

, β = (β1 β2 β3)T (11.3)

and a design matrix

A =

⎛⎜⎜⎜⎝

1 x1 y1

1 x2 y2

· · · · · · · · ·1 xm ym

⎞⎟⎟⎟⎠

176 11 Systems with Three Parameters

Fig. 11.1. Least squares plane, measured and true data triples

so that (11.2) turns into

Aβ ≈ z . (11.4)

The solution vector is

β = BTz ; B = A(ATA)−1 . (11.5)

Uncertainties of the Parameters

We immediately see the difficulties incurred to estimate the measurement un-certainties and the uncertainty “bowls” which should enclose the true plane:Firstly, a (3 × 3) matrix has to be inverted algebraically, and, secondly, thecomponents of the solution vector β have to be linearized. However, not ev-ery user needs to bother about these procedures as there will be softwarepackages to implement the formalism we are looking for.

Expanding the components of the solution vector (11.5) throughout aneighborhood of the point (x0,1, . . . , x0,m; y0,1, . . . , y0,m; z0,1, . . . , z0,m) withrespect to the means x1, . . . , xm; y1, . . . , ym; z1, . . . , zm and making use ofthe common approximations yields, letting k = 1, 2, 3,

βk (x1, . . . , xm; y1, . . . , ym; z1, . . . , zm)

= βk (x0,1, . . . , x0,m; y0,1, . . . , y0,m; z0,1, . . . , z0,m) (11.6)

+m∑

j=1

∂βk

∂xj(xj − x0,j) +

m∑j=1

∂βk

∂yj(yj − y0,j) +

m∑j=1

∂βk

∂zj(zj − z0,j) .

The β0,k = βk(x0,1 . . . , x0,m; y0,1, . . . , y0,m; z0,1, . . . , z0,m) designate the truevalues of the coefficients of the plane (11.1). The systematic errors of theestimators βk are given by

11.1 Planes 177

fβk=

m∑j=1

∂βk

∂xjfxj +

m∑j=1

∂βk

∂yjfyj +

m∑j=1

∂βk

∂zjfzj ; k = 1, 2, 3 . (11.7)

To present the empirical variance–covariance matrix of the solution vector,we expand the βk; k = 1, 2, 3 in

(x1, . . . , xm; y1, . . . , ym; z1, . . . , zm)

with respect to the l-th individual measurements

x1l, . . . , xml; y1l, . . . , yml; z1l, . . . , zml

so that

βkl (x1l, . . . , xml; y1l, . . . , yml; z1l, . . . , zml)

= βk (x1, . . . , xm; y1, . . . , ym; z1, . . . , zm) (11.8)

+m∑

j=1

∂βk

∂xj(xjl − xj) +

m∑j=1

∂βk

∂yj(yjl − yj) +

m∑j=1

∂βk

∂zj(zjl − zj) ;

where l = 1, . . . , n. The partial derivatives

cik =∂βk

∂xi, ci+m,k =

∂βk

∂yi, ci+2m,k =

∂βk

∂zi; i = 1, . . . , m (11.9)

are given in Appendix F. To abbreviate, we introduce the notations

vil = xil

vi = xi

µi = µxi

fi = fxi

fs,i = fs,xi

vi+m,l = yil

vi+m = yi

µi+m = µyi

fi+m = fyi

fs,i+m = fs,yi

vi+2m,l = zil

vi+2m = zi

µi+2m = µzi

fi+2m = fzi

fs,i+2m = fs,zi


βk = β0,k +3m∑j=1

cjk (vj − µj) +3m∑j=1

cjkfj ; k = 1, 2, 3 (11.10)

while (11.8) yields

βkl − βk =3m∑j=1

cjk (vjl − vj) ; k = 1, 2, 3 . (11.11)

From this we find


sβkβk′ =1

n − 1

n∑l=1

[3m∑i=1

cik (vil − vi)

]⎡⎣ 3m∑j=1

cjk′ (vjl − vj)

⎤⎦

=3m∑

i,j=1

cikcjk′

[1

n − 1

n∑l=1

(vil − vi) (vjl − vj)

]

=3m∑

i,j=1

cikcjk′sij (11.12)

where the

sij =1

n − 1

n∑l=1

(vil − vi) (vjl − vj) ; i, j = 1, . . . , 3m (11.13)

denote the empirical variances and covariances of the input data. Assemblingthe sij and the cik within matrices

s = (sij) (11.14)

and

CT =

⎛⎝ c11 c21 · · · c3m,1

c12 c22 · · · c3m,2

c13 c23 · · · c3m,3

⎞⎠ (11.15)

respectively, we may condense (11.12) to

sβ = CTsC . (11.16)

Furthermore, from (11.10) we infer

fβk=

3m∑j=1

cjkfj ; k = 1, 2, 3 . (11.17)

Finally, the uncertainties of the parameters βk; k = 1, 2, 3 of the fitted leastsquares plane

z(x, y) = β1 + β2x + β3y (11.18)

are given by

uβk=

tP (n − 1)√n

√√√√ 3m∑i,j=1

cikcjksij +3m∑j=1

|cjk| fs,j ; k = 1, 2, 3 . (11.19)

11.1 Planes 179


Let fxi = fx; fyi = fy; fzi = fz; i = 1, . . . , m. Obviously, suchlike perturba-tions shift the least squares plane parallel to itself. Splitting up the systematicerrors (11.17),

fβk=

m∑j=1

cjkfxj +m∑

j=1

cj+m,kfyj +m∑

j=1

cj+2m,kfzj

= fx

m∑j=1

cjk + fy

m∑j=1

cj+m,k + fz

m∑j=1

cj+2m,k ,

and using

m∑j=1

cj1 = −β2

m∑j=1

cj+m,1 = −β3

m∑j=1

cj+2m,1 = 1

m∑j=1

cj2 = 0m∑

j=1

cj+m,2 = 0m∑

j=1

cj+2m,2 = 0

m∑j=1

cj3 = 0m∑

j=1

cj+m,3 = 0m∑

j=1

cj+2m,3 = 0

we find

fβ1= −β2fx − β3fy + fz ; fβ2

= 0 ; fβ3= 0 (11.20)

so that

uβ1=

tP (n − 1)√n

√√√√ 3m∑i,j=1

ci1cj1sij +∣∣β2

∣∣ fs,x +∣∣β3

∣∣ fs,y + fs,z

uβ2=

tP (n − 1)√n

√√√√ 3m∑i,j=1

ci2cj2sij

uβ3=

tP (n − 1)√n

√√√√ 3m∑i,j=1

ci3cj3sij . (11.21)

Uncertainty “Bowl”

Inserting (11.10) into (11.18) leads to

z(x, y) = β0,1 + β0,2x + β0,3y (11.22)

+3m∑j=1

(cj1 + cj2x + cj3y) (vj − µj) +3m∑j=1

(cj1 + cj2x + cj3y) fj .


From this we read the systematic error of z(x, y),

fz(x,y) =3m∑j=1

(cj1 + cj2x + cj3y) fj , (11.23)

and its worst case estimation

fs,z(x,y) =3m∑j=1

|cj1 + cj2x + cj3y| fs,j . (11.24)

The least squares planes of the l-th individual measurements

zl = β1l + β2lx + β3ly ; l = 1, . . . , n ,

allow for the differences

zl − z =(β1l − β1

)+(β2l − β2

)x +(β3l − β3

)y (11.25)

=3m∑j=1

(cj1 + cj2x + cj3y) (vjl − vj) ; l = 1, . . . , n .

For any fixed x, the empirical variance turns out to be

s2z(x,y) =

1n − 1

n∑l=1

(zl − z)2 = cTsc , (11.26)

where c designates the auxiliary vector

c = (c11 + c12x + c13y c21 + c22x + c23y . . . c3m,1 + c3m,2x + c3m,3y)T .

(11.27)

Hence the uncertainty “bowl” in which to find the true plane

z0(x, y) = β0,1 + β0,2x + β0,3y (11.28)

is of the form

z(x, y) ± uz(x,y)

uz(x,y) =tP (n − 1)√

n

√cTsc +

3m∑j=1

|cj1 + cj2x + cj3y| fs,j . (11.29)


Let fxi = fx; fyi = fy; fzi = fz. As (11.23) changes into

fz(x,y) = −β2fx − β3fy + fz ,

we have

z(x, y) ± uz(x,y)

uz(x,y) =tP (n − 1)√

n

√cTsc +

∣∣β2

∣∣ fs,x +∣∣β3

∣∣ fs,y + fs,z . (11.30)

Finally, we may convince ourselves of uz(0,0) = uβ1.

11.2 Parabolas 181

11.2 Parabolas

Though second order curves, parabolas are linear in their parameters. In thisrespect, the execution of least squares fits do not pose particular problems.

Given m measured, erroneous coordinate pairs

(x1, y1) , (x2, y2) , . . . , (xm, ym) , (11.31)

we assume the underlying true values (x0,i, y0,i); i = 1, . . . , m to define thetrue parabola

y0(x) = β0,1 + β0,2x + β0,3x2 . (11.32)

Inserting the measured point into the equation y(x) = β1 + β2x + β3x2 leads

to a linear, inconsistent system

β1 + β2xi + β3x2i ≈ yi ; i = 1, . . . , m . (11.33)

Let A, β and y denote the system matrix, the vector of the unknowns andthe vector of the observations respectively,

A =

⎛⎜⎜⎝

1 x1 x21

1 x2 x22

· · · · · · · · ·1 xm x2

m

⎞⎟⎟⎠ ; β =

⎛⎝β1

β2

β3

⎞⎠ ; y =

⎛⎜⎜⎝

y1

y2

. . .ym

⎞⎟⎟⎠ . (11.34)

Using these definitions, we may compress (11.33) into

Aβ ≈ y .

The least squares solution vector is

β = BTy , B = A(ATA

)−1= (bik) . (11.35)

11.2.1 Error-free Abscissas, Erroneous Ordinates

To begin with, we shall follow the traditional approach and assume the ab-scissas to be error-free. Subsequently this restriction will be suspended.

No Repeat Measurements

In the simplest case, we consider the ordinates to represent individual, erro-neous measurements. Furthermore, we shall imply that each of the latter isbiased by one and the same systematic error fy so that

y = (y1 y2 . . . ym)T = y0 + (y − µ) + f

yi = y0,i + (yi − µyi) + fy , i = 1, . . . , m , (11.36)


where

f = fy(1 . . . 1︸︷︷︸m

)T ; −fs,y ≤ fy ≤ fs,y .

The random errors yi − µyi are assumed to be independent and normallydistributed; furthermore, they shall relate to the same parent distribution.

In (11.33), substituting true values x0,i for the abscissas xi, the matrix Bbecomes error-free. The solution vector is

β = BTy . (11.37)


β = β0 + BT(y − µ) + f β , f β = BTf . (11.38)

Clearly, the components of the vector f β are given by

fβ1= fy , fβ2

= 0 , fβ3= 0 . (11.39)

Remarkably enough, in analogy to (10.26) and (10.27), we may prove fTf −fTPf = 0 so that, following (8.11), the minimized sum of squared residualsyields an estimator

s2y =

Qunweighted

m − 3(11.40)

for the theoretical variance

σ2y = E

(Yi − µyi)

2

; i = 1, . . . , m . (11.41)

of the input data1


From the components

βk =m∑

i=1

bikyi ; k = 1, 2, 3 (11.42)

of the solution vector and its expectations

Eβk

= µβk

=m∑

i=1

bikµyi ; k = 1, 2, 3 , (11.43)

1If the existence of a true parabola is doubtful, (11.40), of course, is also ques-tionable.

11.2 Parabolas 183

we obtain the differences

βk − µβk=

m∑i=1

bik(yi−µyi) ; k = 1, 2, 3

so that we can define the elements

σβkβk′ = E(

βk − µβk

) (βk′ − µβk′

)= σ2

y

m∑i=1

bikbik′ ; k, k′ = 1, 2, 3 ,

of the theoretical variance–covariance matrix of the solution vector β. Ac-cording to (11.40), their empirical counterparts are given by

sβkβk′ = s2y

m∑i=1

bikbik′ ; k, k′ = 1, 2, 3 . (11.44)

Finally, the uncertainties of the estimators βk; k = 1, 2, 3 turn out to be

uβk= tP (m − 3)sy

√√√√ m∑i=1

bikbik + fs,βk; k = 1, 2, 3 (11.45)

in which, according to (11.39),

fs,β1= fs,y , fs,β2

= 0 , fs,β3= 0 . (11.46)

Uncertainty Band

Inserting the components of the solution vector (11.38),

βk = β0,k +m∑

i=1

bik (yi − µyi) + fβk; k = 1, 2, 3 ,

into the fitted parabola y(x) = β1 + β2x + β3x2, we find

y(x) = y0(x) +m∑

i=1

(bi1 + bi2x + bi3x

2)(yi − µyi) + fy (11.47)

in which

fy = fβ1(11.48)

designates the systematic error. For any fixed x, the theoretical variance ofY (x) with respect to the expectation

µY (x) = EY (x)

= y0(x) + fy (11.49)


yields

σ2Y (x) = E

(Y (x) − µY (x)

)2 = σ2y

m∑i=1

(bi1 + bi2x + bi3x

2)2

. (11.50)

Hence, we expect the uncertainty band

y(x) ± uy(x)

uy(x) = tP (m − 3)sy

√√√√ m∑i=1

(bi1 + bi2x + bi3x2)2 + fs,y (11.51)

to enclose the true parabola (11.32).

Repeat Measurements

For convenience, we shall preserve the nomenclature hitherto used, i.e. wecontinue to denote the least squares parabola by

y(x) = β1 + β2x + β3x2 ,

though the ordinates will now enter in the form of arithmetic means

y = (y1 y2 . . . ym)T

yi = y0,i + (yi − µyi) + fyi ; −fs,yi ≤ fyi ≤ fs,yi . (11.52)

As usual, we assume each of the ordinates to include the same number, sayn, of repeat measurements. At the same time, we dismiss the restriction thatthe means yi; i = 1, . . . , m are biased by one and the same systematic error.


As we assume the abscissas to be error-free, so are the elements bik of thematrix B. Inserting the error equations (11.52) into

βk =m∑

i=1

bikyi ; k = 1, 2, 3 (11.53)

yields

βk (y1, . . . , y2) = β0,k +m∑

i=1


i=1

bikfyi . (11.54)

Obviously, the

fβk=

m∑i=1

bikfyi , fs,βk=

m∑i=1

|bik| fs,yi (11.55)

11.2 Parabolas 185

denote the propagated systematic errors of the components βk; k = 1, 2, 3 ofthe solution vector. To assess the influence of random errors, we insert themeans

yi =1n

n∑l=1

yil , i = 1, . . . , m

into (11.53),

βk =m∑

i=1

bik

[1n

n∑l=1

yil

]=

1n

n∑l=1

[m∑

i=1

bikyil

]=

1n

n∑l=1

βkl ; k = 1, 2, 3 .

(11.56)

Obviously, the quantities

βkl =m∑

i=1

bikyil ; k = 1, 2, 3 ; l = 1, . . . , n (11.57)

are the estimators of an adjustment which uses only every l-th measurementof the means yi; i = 1, . . . , m. The differences

βkl − βk =m∑

i=1

bik (yil − yi) ; k = 1, 2, 3 ; l = 1, . . . , n (11.58)

lead us to the elements

sβkβk′ =1

n − 1

n∑l=1

(βkl − βk

) (βk′l − βk′

)

=1

n − 1

n∑l=1

[m∑

i=1

bik (yil − yi)

]⎡⎣ m∑j=1

bjk′ (yjl − yj)

⎤⎦

=m∑

i,j=1

bikbjk′sij ; k, k′ = 1, 2, 3 (11.59)

of the empirical variance–covariance matrix of the solution vector β. As tocomputer-assisted approaches, the matrix representation

sβ =

⎛⎜⎝ sβ1β1

sβ1β2sβ1β3

sβ2β1sβ2β2

sβ2β3

sβ3β1sβ3β2

sβ3β3

⎞⎟⎠ = BTsB ; sβkβk

≡ s2βk

(11.60)

proves to be more convenient. The matrix

s = (sij) , sij =1

n − 1

n∑l=1

(yil − yi) (yjl − yj) , i, j = 1, . . . , m

(11.61)


designates the empirical variance–covariance matrix of the input data. Nowwe are in a position to establish Student’s confidence intervals

βk − tP (n − 1)√n

sβk≤ µβk

≤ βk +tP (n − 1)√

nsβk

; k = 1, 2, 3 (11.62)

for the expectations Eβk = µβk.

Finally, the overall uncertainties uβkof the results βk ± uβk

; k = 1, 2, 3turn out to be

uβk=

tP (n − 1)√n

sβk+ fs,βk

=tP (n − 1)√

n

√√√√ m∑i,j=1

bikbjksij +m∑

i=1

|bik| fs,yi

k = 1, 2, 3 . (11.63)


Let fyi = fy; i = 1, . . . , m. The expansions given in Appendix F (where wehave to substitute true abscissas x0,j for the means xj quoted there) yield

m∑j=1

bj1 = 1 ,

m∑j=1

bj2 = 0 ,

m∑j=1

bj3 = 0 , (11.64)

i.e.

fβ1= fy , fβ2

= 0 , fβ3= 0 (11.65)

so that

uβ1=

tP (n − 1)√n

√√√√ m∑i,j=1

bi1bj1sij + fs,y

uβ2=

tP (n − 1)√n

√√√√ m∑i,j=1

bi2bj2sij

uβ3=

tP (n − 1)√n

√√√√ m∑i,j=1

bi3bj3sij . (11.66)

Uncertainty Band

Insertion of (11.54) into the least squares parabola y(x) = β1 + β2x + β3x2

yields

y(x) = y0(x) +m∑

j=1

(bj1 + bj2x + bj3x

2) (

yj − µyj

)

+m∑

j=1

(bj1 + bj2x + bj3x

2)fyj . (11.67)

11.2 Parabolas 187

Clearly, the x-dependent systematic error is given by

fy(x) =m∑

j=1

(bj1 + bj2x + bj3x

2)fyj

and its worst case estimation yields

fs,y(x) =m∑

j=1

∣∣bj1 + bj2x + bj3x2∣∣ fs,yj . (11.68)

Furthermore, from

yl(x) = β1,l + β2,lx + β3,lx2

y(x) = β1 + β2x + β3x2

and (11.58) we find

yl(x) − y(x) =(β1l − β1

)+(β2l − β2

)x +(β3l − β3

)x2

=m∑

j=1

(bj1 + bj2x + bj3x

2)(yjl − yj) (11.69)

so that

s2y(x) =

1n − 1

n∑l=1

[yl(x) − y(x)]2 = bTsb , (11.70)

where the auxiliary vector

b =(b11 + b12x + b13x

2 b21 + b22x + b23x2 . . . bm1 + bm2x + bm3x

2)T(11.71)

has been introduced. The matrix s was denoted in (11.61). Ultimately, thetrue parabola (11.32) is expected to lie within an uncertainty band of theform

y(x) ± uy(x)

uy(x) =tP (n − 1)√

n

√bTsb +

m∑j=1

∣∣bj1 + bj2x + bj3x2∣∣ fs,yj . (11.72)


Let fyi = fy; i = 1, . . . , m. According to (11.64), relation (11.68) turns intofs,y(x) = fs,y, i.e.

y(x) ± uy(x) uy(x) =tP (n − 1)√

n

√bTsb + fs,y . (11.73)


11.2.2 Erroneous Abscissas, Erroneous Ordinates

If, as is common experimental practice, both the abscissas and the ordinatesare erroneous quantities, the elements bik of the matrix B as defined in(11.35) are erroneous as well. Accordingly, we linearize the estimators βk

twofold, namely throughout the neighborhoods of the points

(x0,1, . . . , x0,m ; y0,1, . . . , y0,m) and (x1, . . . , xm ; y1, . . . , ym)

respectively. In the former case we have

βk (x1, . . . , xm; y1, . . . , ym)

= βk (x0,1, . . . , x0,m; y0,1, . . . , y0,m)

+m∑

j=1

∂βk

∂x0,j(xj − x0,j) +

m∑j=1

∂βk

∂y0,j(yj − y0,j) + · · ·

The constants β0,k = βk(x0,1, . . . , x0,m; y0,1, . . . , y0,m); k = 1, 2, 3 denote thetrue values of the estimators. Furthermore, we have to relate the partialderivatives to the point (x1, . . . , xm; y1, . . . , ym),

∂βk

∂x0,j≈ ∂βk

∂xj= cjk ,

∂βk

∂y0,j≈ ∂βk

∂yj= cj+m,k ; j = 1, . . . , m .

The constants cjk, cj+m,k are quoted in Appendix F. With respect to theerror equations

xj = x0,j +(xj − µxj

)+ fxj ; −fs,xj ≤ fxj ≤ fs,xj

yj = y0,j +(yj − µyj

)+ fyj ; −fs,yj ≤ fyj ≤ fs,yj

and the second expansion which is still due, we introduce the notations

vjl = xjl ; vj+m,l = yjl ; vj = xj ; vj+m = yj

µj = µxj ; µj+m = µyj ; fj = fxj ; fj+m = fyj .

Hence, we have

βk = β0,k +2m∑j=1

cjk (vj − µj) +2m∑j=1

cjkfj ; k = 1, 2, 3 . (11.74)

From this, we infer the propagated systematic errors fβkof the components

βk as

fβk=

2m∑j=1

cjkfj ; k = 1, 2, 3 .

11.2 Parabolas 189

The second expansion throughout a neighborhood of the point (x1, . . . , xm;y1, . . . , ym) with respect to every l-th repeat measurement yields

βkl (x1l, . . . , xml; y1l, . . . , yml) = βk (x1, . . . , xm; y1, . . . , ym)

+2m∑j=1

∂βk

∂xj(xjl − xj) +

2m∑j=1

∂βk

∂yj(yjl − yj) + · · ·

i.e.


cjk (vjl − vj) ; k = 1, 2, 3 ; l = 1, . . . , n . (11.75)

These differences allow the elements

sβkβk′ =1

n − 1

n∑l=1

(βkl − βk

) (βk′l − βk′

)

=1

n − 1

n∑l=1

[2m∑i=1

cik (vil − vi)

]⎡⎣ 2m∑j=1

cjk′ (vjl − vj)

⎤⎦

=2m∑

i,j=1

cikcjk′sij ; k, k′ = 1, 2, 3 (11.76)

of the empirical variance–covariance matrix sβ of the solution vector β tobe calculated. The sij designate the elements of the (expanded) empiricalvariance–covariance matrix of the input data,

s = (sij) , sij =1

n − 1

n∑l=1

(vil − vi) (vjl − vj) ; i, j = 1, . . . , 2m .

(11.77)

Finally, with the coefficients cik being comprised within the auxiliary matrix

CT =

⎛⎜⎝ c11 c21 · · · c2m,1

c12 c22 · · · c2m,2

c13 c23 · · · c2m,3

⎞⎟⎠ , (11.78)

we arrive at

sβ =

⎛⎜⎝ sβ1β1

sβ1β2sβ1β3

sβ2β1sβ2β2

sβ2β3

sβ3β1sβ3β2

sβ3β3

⎞⎟⎠ = CTsC . (11.79)

Thus, the uncertainties of the estimators βk; k = 1, 2, 3 turn out to be

uβk=

tP (n − 1)√n

√√√√ 2m∑i,j=1

cikcjksij +2m∑j=1

|cjk| fs,j ; k = 1, 2, 3 . (11.80)


Equal systematic errors

Let fxi = fx, fyi = fy; i = 1, . . . , m. As

m∑j=1

cj1 = −β2 ,

m∑j=1

cj+m,1 = 1 ,

m∑j=1

cj2 = −2β3

m∑j=1

cj+m,2 = 0 ,

m∑j=1

cj3 = 0 ,

m∑j=1

cj+m,3 = 0 , (11.81)

the propagated systematic errors change into

fβk=

2m∑j=1

cjkfj = fx

m∑j=1

cjk + fy

m∑j=1

cj+m,k ; k = 1, 2, 3

i.e.

fβ1= −β2fx + fy ; fβ2

= −2β3fx ; fβ3= 0

so that

fs,β1=∣∣β2

∣∣ fs,x + fs,y ; fs,β2= 2∣∣β3

∣∣ fs,x ; fs,β3= 0 . (11.82)

Finally, (11.80) turns into

uβ1=

tP (n − 1)√n

√√√√ 2m∑i,j=1

ci1cj1sij +∣∣β2

∣∣ fs,x + fs,y

uβ2=

tP (n − 1)√n

√√√√ 2m∑i,j=1

ci2cj2sij + 2∣∣β3

∣∣ fs,x

uβ3=

tP (n − 1)√n

√√√√ 2m∑i,j=1

ci3cj3sij . (11.83)

Uncertainty Band

Insertion of (11.74) into the least squares parabola y(x) = β1 + β2x + β3x2

yields

y(x) = y0(x) +2m∑j=1

(cj1 + cj2x + cj3x

2)(vj − µj)

+2m∑j=1

(cj1 + cj2x + cj3x

2)fj . (11.84)

11.2 Parabolas 191

Thus, the systematic error of y(x) turns out as

fy(x) =2m∑j=1

(cj1 + cj2x + cj3x

2)fj . (11.85)

We then combine

y(x) = β1 + β2x + β3x2

and

yl(x) = β1l + β2lx + β3lx2 ; l = 1, . . . , n

to define the n differences

yl − y =(β1l − β1

)+(β2l − β2

)x +(β3l − β3

)x2 . (11.86)

Inserting (11.75), we are in a position, for any fixed x, to quote the empiricalvariance of the yl(x) with respect to y(x),

s2y(x) =

1n − 1

n∑l=1

(yl − y)2 = cTsc . (11.87)

Here, s designates the (expanded) empirical variance–covariance matrix(11.77) of the input data and c the auxiliary vector

c =(c11 + c12x + c13x

2 c21 + c22x + c23x2 . . . c2m,1 + c2m,2x + c2m,3x

2)T

.

(11.88)

Hence, we expect the true parabola (11.32) to lie within the uncertainty band

y(x) ± uy(x)

uy(x) =tP (n − 1)√

n

√cTsc +

2m∑j=1

∣∣cj1 + cj2x + cj3x2∣∣ fs,j . (11.89)

As may be shown, uy(0) = uβ1.


Let fxi = fx, fyi = fy; i = 1, . . . , m. As

fy(x) = fx

m∑j=1

(cj1 + cj2x + cj3x

2)

+ fy

m∑j=1

(cj+m,1 + cj+m,2x + cj+m,3x

2)

= −β2fx − 2β3xfx + fy

(11.89) changes into

uy(x) =tP (n − 1)√

n

√cTsc +

∣∣β2 + 2β3x∣∣ fs,x + fs,y . (11.90)


Example

Conversion of erroneous data pairs (xi, yi) into data pairs (ξ0,i, ηi) the ab-scissas ξ0,i of which are error-free while the ordinates ηi carry the errors ofthe abscissas xi and the ordinates yi.

Let it suffice to base the conversion on, say, five neighboring data pairs(xi−2, yi−2), . . . , (xi+2, yi+2). Then, every so-considered group of data definesa least squares parabola y(x) and an associated uncertainty band uy(x). Wesimply put ξ0,i = xi and ηi = y(ξ0,i). We may then define yl(ξ0,i); l = 1, . . . , n,so that

ηi = y (ξ0,i) =1n

n∑l=1

yl (ξ0,i) .

The uncertainty uy(ξ0,i) has been quoted in (11.89).

11.3 Piecewise Smooth Approximation

Suppose we want to approximate an empirical function, defined through asequence of data pairs (xi, yi); i = 1, . . . by a string of consecutive parabolasthe junctions of which shall be continuous and differentiable, Fig. 11.2. Forsimplicity, we shall restrict ourselves to two consecutive parabolas so thatthere is but one junction. We shall presume that there are sufficiently manymeasuring points within the regions of approximation, so that it is possible tosubstitute points ξ0,i, ηi for the measuring points (xi, yi), the abscissas ξ0,i ofwhich may be considered error-free while the ordinates ηi include the errorsof the abscissas xi and the ordinates yi. This idea has been outlined at theend of the preceding section. Let us assume that the transformations havealready been done and that, for convenience, our common notation (x0,i, yi);i = 1, 2, . . . has been reintroduced.

Solution Vector

Let ξ2 designate the junction of parabolas I and II. We then have

yI(x) = β1 + β2x + β3x2 ; ξ1 ≤ x < ξ2

yII(x) = β4 + β5x + β6x2 ; ξ2 ≤ x < ξ3 . (11.91)

Obviously, the parabolas have to satisfy the conditions

yI (ξ2) = yII (ξ2) ; y′I (ξ2) = y′

II (ξ2) . (11.92)

Defining the column vector

β = (β1 β2 β3 β4 β5 β6)T

, (11.93)

11.3 Piecewise Smooth Approximation 193

Fig. 11.2. Consecutive parabolas with continuous and differentiable junctions forpiecewise smooth approximation of an empirical function

according to (7.15), we require

Hβ = 0 , (11.94)

where

H =

(1 ξ2 ξ2

2 −1 −ξ2 −ξ22

0 1 2ξ2 0 −1 −2ξ2

). (11.95)

To stress, the βk; k = 1, . . . , 6 do not define a common polynomial; neverthe-less we shall adjust them in one single step. The parabolas defined in (11.91)give rise to two inconsistent systems

β1 + β2x0,i + β3x20,i ≈ yi ; i = 1, . . . , m1

β4 + β5x0,i + β6x20,i ≈ yi ; i = m1 + 1, . . . , m1 + m2 = m . (11.96)

Let us assign the means

yi =1n

n∑l=1

yil ; i = 1, . . . , m (11.97)

to a column vector

y = (y1 y2 . . . ym)T (11.98)

and the coefficients 1, x0,i, x20,i to a system matrix of the form

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 x0,1 x20,1 | 0 0 0

. . . . . . . . . | . . . . . . . . .

1 x0,m1 x20,m1

| 0 0 0−−− −−− −−− | − −− −− − −−−

0 0 0 | 1 x0,m1+1 x20,m1+1

. . . . . . . . . | . . . . . . . . .

0 0 0 | 1 x0,m x20,m

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.



Aβ ≈ y . (11.99)

The solution vector follows from (7.16),

β =[(

ATA)−1

AT(ATA

)−1HT

×[H(ATA

)−1HT]−1

H(ATA

)−1AT

]y . (11.100)

To abbreviate, we assign a matrix DT to the outer square brackets, so that

β = DTy ; D = (dik) , (11.101)

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

β1

β2

β3

−−β4

β5

β6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

d11 d21 . . . dm1

d12 d22 . . . dm2

d13 d23 . . . dm3

−− −− −− −−d14 d24 . . . dm4

d15 d25 . . . dm5

d16 d26 . . . dm6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

⎛⎜⎜⎝

y1

y2

. . .ym

⎞⎟⎟⎠

in components,

βk =m∑

i=1

dikyi ; k = 1, . . . , 6 . (11.102)

Uncertainty Band of a String of Parabolas

For simplicity, we confine ourselves to the uncertainty band of the fittedparabola

yI(x) = β1 + β2x + β3x2 . (11.103)

Obviously, this formally agrees with (11.72),

yI(x) ± uyI(x)

uyI(x) =tP (n − 1)√

n

√dT

I sdI +m∑

i=1

∣∣di1 + di2x + di3x2∣∣ fs,yi , (11.104)

where the auxiliary vector

dI =(d11 + d12x + d13x

2 d21 + d22x + d23x2 . . . dm1 + dm2x + dm3x

2)T

(11.105)

11.4 Circles 195

corresponds to (11.71) and the matrix s to (11.61). With respect to the fittedparabola yII(x) we have just two modifications to make: In the uncertaintycomponent due to random errors we have to substitute an auxiliary vector

dII =(d14 + d15x + d16x

2 d24 + d25x + d26x2 . . . dm4 + dm5x + dm6x

2)T

(11.106)

for dI, and in the uncertainty component due to systematic errors the ma-trix elements di4, di5, di6 for the elements di1, di2, di3. Obviously, any errorcombination yields a continuous and differentiable string of parabolas.

The procedure may be generalized to more than two regions of approxi-mation, e.g. to three regions, implying junctions ξ1 and ξ2, then the matrixH takes the form

H =

⎛⎜⎜⎜⎜⎝

1 ξ1 ξ21 | −1 −ξ1 −ξ2

1 | 0 0 00 1 2ξ1 | 0 −1 −2ξ1 | 0 0 0

−− −− −− | −− −− −− | −− −− −−0 0 0 | 1 ξ2 ξ2

2 | −1 −ξ2 −ξ22

0 0 0 | 0 1 2ξ2 | 0 −1 −2ξ2

⎞⎟⎟⎟⎟⎠ . (11.107)

11.4 Circles

In contrast to polynomials, the equation of a circle is non-linear in the pa-rameters to be adjusted. Consequently, we first have to linearize. This canbe done either by truncating a series expansion or through a transformation,namely a stereographic projection2. We here will content ourselves with seriestruncation.

Linearization

As has been discussed in Sect. 10.2, we expand the circle’s equation

φ(r, xM , yM ; x, y) = r2 − (x − xM )2 − (y − yM )2 = 0 (11.108)

throughout a neighborhood of some starting values r∗, x∗M , y∗

M ,

φ(r, xM , yM ; x, y) = φ (r∗, x∗M , y∗

M ; x, y) (11.109)

+∂φ

∂r∗(r − r∗) +

∂φ

∂x∗M

(xM − x∗M ) +

∂φ

∂y∗M

(yM − y∗M ) + · · · = 0 .

As

∂φ

∂r∗= 2r∗ ,

∂φ

∂x∗M

= 2(x − x∗M ) ,

∂φ

∂y∗M

= 2(y − y∗M )

2E.g. Knopp, K., Elemente der Funktionentheorie, Sammlung Goschen,vol. 1109, Walter de Gruyter, Berlin 1963


we find, truncating non-linear terms,

r∗ 2 − (x − x∗M )2 − (y − y∗

M )2 (11.110)

+ 2r∗ (r − r∗) + 2 (x − x∗M ) (xM − x∗

M ) + 2 (y − y∗M ) (yM − y∗

M ) ≈ 0 .

Introducing the abbreviations

β1 = r − r∗ , β2 = xM − x∗M , β3 = yM − y∗

M

u(x) =x − x∗

M

r∗, v(y) =

y − y∗M

r∗

w(x, y) =1

2r∗

[(x − x∗

M )2 + (y − y∗M )2 − r∗ 2

](11.111)

the expansion (11.110) turns into

β1 + u(x)β2 + v(y)β3 ≈ w(x, y) . (11.112)

Obviously, in a formal sense, (11.112) is the equation of a plane. In orderto reduce the linearization errors, we cyclically repeat the adjustment, eachtime replacing the “old” starting values with improved “new” ones.

Suppose there are m ≥ 4 measured data pairs (xi, yi); i = 1, . . . , m, lyingin the vicinity of the circumference of the circle and having the structure

xi =1n

n∑l=1

xil = x0,i + (xi − µxi) + fxi ,

yi =1n

n∑l=1

yil = y0,i + (yi − µyi) + fyi .

Inserting the latter into (11.112)) produces the linear, inconsistent system

β1 + u (xi) β2 + v (yi)β3 ≈ w (xi, yi) ; i = 1, . . . , m . (11.113)

The system matrix A, the column vector β of the unknowns and the columnvector w of the right-hand sides

A =

⎛⎜⎜⎝

1 u1 v1

1 u2 v2

. . . . . . . . .1 um vm

⎞⎟⎟⎠ ; β =

⎛⎝β1

β2

β3

⎞⎠ ; w =

⎛⎜⎜⎝

w1

w2

. . .wm

⎞⎟⎟⎠ ,

turn (11.113) into

Aβ ≈ w . (11.114)

11.4 Circles 197

Iteration of the Solution Vector


β = BTw , B = A(ATA

)−1. (11.115)

Its components β1, β2, β3 define the parameters of the circle,

r = β1 + r∗ , xM = β2 + x∗M , yM = β3 + y∗

M . (11.116)

Interpreting the results r, xM , yM as new, improved starting values r∗x∗M , y∗

M ,we are in a position to cyclically repeat the adjustment. As has been discussedin Sect. 10.2, we expect the estimators to tend to zero,

β1 → 0, β2 → 0, β3 → 0 .

This will, however, be the case only as far as allowed by the measurementerrors (and, if so, by the accuracy of the machine). In

βk =m∑

j=1

bjkwj ; k = 1, 2, 3 (11.117)

the coefficients bjk as well as the quantities wj prove to be erroneous.

Uncertainties of the Parameters of the Circle

In order to estimate the uncertainties, we carry out series expansions through-out the neighborhoods of the points

(x0,1, . . . , x0,m ; y0,1, . . . , y0,m) and (x1, . . . , xm ; y1, . . . , ym) .

The former leads us to the propagated systematic errors and the latter to theempirical variance–covariance matrix of the components βk; k = 1, 2, 3 of thesolution vector respectively. The expansion with respect to the true valuesyields

βk (x1, . . . , ym) = βk (x0,1, . . . , y0,m) (11.118)

+m∑

j=1

∂βk

∂x0,j(xj − x0,j) +

m∑j=1

∂βk

∂y0,j(yj − y0,j) + · · · ; k = 1, 2, 3 .

As usual, we linearize the expansion and approximate the partial derivativesaccording to

∂βk

∂x0,j≈ ∂βk

∂xj= cjk ,

∂βk

∂y0,j≈ ∂βk

∂yj= cj+m,k ; k = 1, 2, 3 , j = 1, . . . , m .

The coefficients cjk and cj+m,k are given in Appendix F. Furthermore, weintroduce


fj = fxj , fj+m = fyj ; j = 1, . . . , m .

Then, according to (11.118), the propagated systematic errors of the βk;k = 1, 2, 3 turn out as

fβk=

2m∑j=1

cjkfj , fs, βk=

2m∑j=1

|cjk| fs,j ; k = 1, 2, 3 . (11.119)

The variances and covariances follow from the second expansion

βkl (x1l, . . . , yml) = βk (x1, . . . , ym) (11.120)

+m∑

j=1

∂βk

∂xj(xjl − xj) +

m∑j=1

∂βk

∂yj(yjl − yj) + · · · .

We define

vjl = xjl , vj+m,l = yjl , vj = xj , vj+m = yj

µj = µxj , µj+m = µyj ; j = 1, . . . , m

so that


cjk (vjl − vj) , k = 1, 2, 3 , l = 1, . . . , n . (11.121)

From this we obtain

sβkβk′ =1

n − 1

n∑l=1

[2m∑i=1

cik (vil − vi)

]⎡⎣ 2m∑j=1

cjk′ (vjl − vj)

⎤⎦

=2m∑

i,j=1

cikcjksij ; k, k′ = 1, 2, 3 . (11.122)

Assigning the coefficients cjk to the (2m × 3) auxiliary matrix

CT =

⎛⎝ c11 c21 . . . c2m,1

c12 c22 . . . c2m,2

c13 c23 . . . c2m,3

⎞⎠ (11.123)

the empirical variance–covariance matrix of the solution vector β takes theform

sβ =

⎛⎜⎝ sβ1β1

sβ1β2sβ1β3

sβ2β1sβ2β2

sβ2β3

sβ3β1sβ3β2

sβ3β3

⎞⎟⎠ = CTsC ; sβkβk

≡ s2βk

. (11.124)

11.4 Circles 199

Here

s = (sij) ; i, j = 1, . . . , 2m , sij =1

n − 1

n∑l=1


(11.125)

denotes the empirical variance–covariance matrix of the input data. After all,the primary results of the adjustment are given by

βk ± uβk(11.126)

uβk=

tP (n − 1)√n

√√√√ 2m∑i,j=1

cikcjksij +2m∑j=1

|cjk| fs,j ; k = 1, 2, 3 .

The parameters of circle follow from

r = β1 + r∗ , xM = β2 + x∗M , yM = β3 + y∗

M .

The uncertainties of the latter are, of course, identical with those of the βk;k = 1, 2, 3.


Let fxi = fx; fyi = fy; i = 1, . . . , m. As (11.119) changes into

fβk= fx

m∑j=1

cjk + fy

m∑j=1

cj+m,k ; k = 1, 2, 3 , (11.127)

according to Appendix F we findm∑

j=1

cj1 = − β2

r∗;

m∑j=1

cj+m,1 = − β3

r∗;

m∑j=1

cj2 = 1

m∑j=1

cj+m,2 = 0 ;m∑

j=1

cj,3 = 0 ;m∑

j=1

cj+m,3 = 1 . (11.128)

The propagated systematic errors become

fβ1= − 1

r∗(β2fx + β3fy

) ≈ 0 ; fβ2= fx , fβ3

= fy (11.129)

so that3

uβ1=

tP (n − 1)√n

sβ1, uβ2

=tP (n − 1)√

nsβ2

+ fs,x

uβ3=

tP (n − 1)√n

sβ3+ fs,y . (11.130)

3On finishing the iteration, the absolute values of the estimators β2 and β3 willbe very small so that fβ1

will be close to zero – the radius remains insensitive toshifts of the center of the circle.


Uncertainty Regions

From (11.111) we conclude

β1 = r − r∗ ; µβ1= Eβ1 = E r − r∗ = µr − r∗ ,

i.e.

β1 − µβ1= r − µr ,

and, similarly,

β2 − µβ2= xM − µxM ; β3 − µβ3

= yM − µyM

so that

β − µβ = (r − µr xM − µxM yM − µyM )T . (11.131)

Referring to Hotelling’s ellipsoid (9.64)

t2P (r, n) = n(β − µβ

)T (CTsC

)−1 (β − µβ

)(11.132)

we may replace β − µβ by a vector of the kind(β − β

)= (r − r x − xM y − yM )T (11.133)

so that (11.132) takes the form of a confidence ellipsoid

t2P (r, n) = n(β − β

)T (CTsC

)−1 (β − β

)(11.134)

which we depict in a rectangular r, xM , yM coordinate system. Combining(11.134) with the pertaining security polyhedron produces the EPC hull,the procedure turning out to be more simple in case we restrict ourselves tothe degeneration presented in (11.129).

There are two uncertainty regions to be constructed: a very small innercircular region, enclosing the admissible centers and a much larger, outerannular region, marking off the set of allowable circles. The inner region isgiven by the projection of the EPC hull onto the xM , yM -plane.

To find the outer region, we proceed numerically. To this end, we put abunch of rays through the point xM , yM covering the angular range 0, . . . , 2πuniformly and densely enough.

Let us intersect the EPC hull with a plane perpendicular to the r-axisand consider the orthogonal projection of the intersection onto the xM , yM -plane. Obviously, each point of the so-projected line represents an admissiblecenter (centers lying in the interior are of no interest). Each of the centers hasthe same radius, namely that which is given by the height of the intersect-ing plane. Repeating the procedure with a sufficiently closely spaced system

11.4 Circles 201

of intersecting planes produces the set of all admissible centers and circles.We then consider the intersections of the circles with the rays of the bunch.Sorting the intersections on a given ray according to their distances from thepoint xM , yM , the intersection with the largest distance obviously establishesa point of the outer boundary of the ring-shaped uncertainty region in ques-tion while the intersection with the smallest distance marks a point of theinner boundary of that region.

12 Special Metrology Systems

12.1 Dissemination of Units

Two basic procedures are available for disseminating units: the linking of so-called working standards to primary standards and the intercomparison ofstandards (of any kind) through what is called key comparisons.

12.1.1 Realization of Working Standards

The primary standards of the SI units are the starting points for the realiza-tion of day-to-day employed working standards. By nature, the accuracy suchstandards should have, depends on their field of application. This aspect hasled to a hierarchy of standards of decreasing accuracies. Being established,we shall denote the pertaining “ranks” by Hk; k = 1, 2, . . . and use growingk to express decreasing accuracy.

Suppose the comparator operates by means of an equation of the kind

βk = βk−1 + x ; k = 1, 2, . . . . (12.1)

Here, βk designates the standard of rank Hk, which is to be calibrated, βk−1

the standard of rank Hk−1, which accomplishes the link-up, and, finally, xthe difference shown in the comparator display. Normally, any link-up impliesa loss of accuracy. Let us assume l = 1, . . . , n repeat measurements

xl = x0 + (xl − µx) + fx ; l = 1, . . . , n , (12.2)

where x0 designates the true value of the displayed difference xl, εl = xl −µx

a random and fx a systematic error. The n repeat measurements define anarithmetic mean and an empirical variance,

x =1n

n∑l=1

xl ; s2x =

1n − 1

n∑l=1

(xl − x)2 . (12.3)

Let the systematic error fx be contained within an interval

−fs,x ≤ fx ≤ fs,x . (12.4)

204 12 Special Metrology Systems

Fig. 12.1. Primary standard: nominal value N(β) and true value β0

Then, the interval

x − ux ≤ x0 ≤ x + ux ; ux =tP (n − 1)√

nsx + fs,x (12.5)

should localize the true value x0 of the differences xl; l = 1, . . . , n.Let N(β) denote the nominal value of a given primary standard, e.g.

N(β) = 1 kg (exact) and β0 its actual, physically true value, which, ofcourse, is unknown. Let the nominal value and the true value differ by an un-known systematic error fN(β), which we assume to be confined to the interval−uN(β), . . . , uN(β). We then have, Fig. 12.1,

β0 = β − fN(β) ; −uN(β) ≤ fN(β) ≤ uN(β) . (12.6)

Clearly, whenever N(β) is used, its true value β0 will be effective. In linkingup a standard β1 of true value β0,1 to N(β), the physical error equation reads

β1,l = β0 + xl = β0 + x0 + (xl − µx) + fx ; l = 1, . . . , n .

Obviously, β0 and x0 define the true value

β0,1 = β0 + x0 (12.7)

of the standard β1. Averaging over the n values β1,l; l = 1, . . . , n leads to

β1 = β0 + x . (12.8)

Unfortunately, the so-defined quantity is metrologically undefined. In order toarrive at a useful statement, we have to substitute N(β) for β0 and includethe associated systematic error fN(β) into the uncertainty of the modifiedmean

β1 = N(β) + x (12.9)

so that

12.1 Dissemination of Units 205

Fig. 12.2. The true values of standards of same hierarchical rank need not coincide,nor must their uncertainties overlap – the respective uncertainties should ratherlocalize the associated true values

uβ1= uN(β) +

tP (n − 1)√n

sx + fs,x . (12.10)

According to

β1 − uβ1≤ β0,1 ≤ β1 + uβ1

, (12.11)

the interval β1±uβ1localizes the true value β0,1 of the lower-ranking standard

β1. The true value β0,1 of β1 will, of course, differ from the true value β0 ofthe primary standard N(β).

12.1.2 Key Comparison

So-called key comparisons shall ensure the national measurement institutesto rely on equivalent SI standards. Let β(1), β(2), . . . , β(m) denote a selectedgroup of standards (of equal physical quality) with true values β

(i)0 ; i =

1, . . . , m. The superscripts distinguish individuals of the same hierarchicalrank, Fig. 12.2. Again, the true values of the standards need not coincide.It is rather indispensable that the uncertainties of the standards localizethe respective true values. Key comparisons are implemented through roundrobins in which a suitable transfer standard, which we shall call T , is passed onfrom one participant to the next where it is calibrated on the spot, Fig. 12.3.During the portage the physical properties of the transfer standard must notalter so that each link-up relates exactly to one and the same true value.Subsequently, the mutual consistency of the results has to be tested.

The i-th participant, i ∈ 1, . . . , m, links the transfer standard T to hislaboratory standard β(i). We assume the uncertainties of the laboratory stan-dards to localize the pertaining true values β

(i)0 so that

β(i) − uβ(i) ≤ β(i)0 ≤ β(i) + uβ(i) , i = 1, . . . , m . (12.12)

For convenience, let us rewrite these inequalities as a set of equations


Fig. 12.3. Round robin for a group of m standards β(i); i = 1, . . . , m realized bymeans of a transfer standard T . The calibrations T (i) should be mutually consistent

β(i) = β(i)0 + fβ(i) ; −uβ(i) ≤ fβ(i) ≤ uβ(i) ; i = 1, . . . , m . (12.13)

The physical error equations of the m link-ups read

T(i)l = β

(i)0 + x

(i)0 +

(x

(i)l − µx(i)

)+ fx(i) ; i = 1, . . . , m ; l = 1, . . . , n

T (i) = β(i)0 + x

(i)0 +

(x(i) − µx(i)

)+ fx(i) ; i = 1, . . . , m .

The second group of equations is obviously identical to

T (i) = β(i)0 + x(i) ; i = 1, . . . , m .

For practical reasons, we have to substitute estimators β(i) for the inaccessibletrue values β

(i)0 and to include the imported errors in the uncertainty uT (i)

of the relevant mean T (i). We then find

T (i) = β(i) + x(i) , uT (i) = uβ(i) + ux(i) ; i = 1, . . . , m . (12.14)

Let T0 designate the true value of the transfer standard. As

T0 = β(i)0 + x

(i)0 ; i = 1, . . . , m , (12.15)

we have ∣∣∣T (i) − T0

∣∣∣ ≤ uT (i) ; i = 1, . . . , m . (12.16)

Figure 12.4 depicts the result of a consistent round robin.We might wish to quantify differences of the kind T (i) − T (j). Here, we

should observe∣∣∣T (i) − T (j)∣∣∣ ≤ uβ(i) + ux(i) + uβ(j) + ux(j) ; i = 1, . . . , m .


Fig. 12.4. Round robin: in case of consistency of the standards β(i); i = 1, . . . , m,a horizontal line may be drawn intersecting each of the uncertainties uT (i) of thecalibrations T (i)

Finally, we refer to the m means T (i) to establish a weighted grand mean1 asdefined in (9.41),

β =m∑

i=1

wiT(i) ; wi =

g2i

m∑i=1

g2i

; gi =1

uT (i). (12.17)

In this context, the mean β is called a key comparison reference value(KCRV). We derive the uncertainties udi

of the m differences

di = T (i) − β ; i = 1, . . . , m . (12.18)

To this end, we have to insert (12.14) into (12.18), so that

di = β(i) + x(i) −m∑

j=1

wj

(β(j) + x(j)

).

Combining the two errors fx(i) and fβ(i)

fi = fx(i) + fβ(i) , (12.19)

the error representation of (12.18) reads

di = T0 +(x(i) − µx(i)

)+ fi −

m∑j=1

wj

[T0 +

(x(j) − µx(j)

)+ fj

]

=(x(i) − µx(i)

)−

m∑j=1

wj

(x(j) − µx(j)

)+ fi −

m∑j=1

wjfj . (12.20)

1As has been discussed in Sect. 9.3, a group of means with unequal true valuesdoes not define a coherent grand mean.


Reverting to individual measurements x(i)l we find

di =1n

n∑l=1

(x

(i)l − µx(i)

)−

m∑j=1

wj

[1n

n∑l=1

(x

(j)l − µx(j)

)]+ fi −

m∑j=1

wjfj

=1n

n∑l=1

⎡⎣(x(i)

l − µx(i)

)−

m∑j=1

wj

(x

(j)l − µx(j)

)⎤⎦+ fi −m∑

j=1

wjfj

so that

di,l =(x

(i)l − µx(i)

)−

m∑j =1

wj

(x

(j)l − µx(j)

)+ fi −

m∑j=1

wjfj

l = 1, . . . , n . (12.21)

Hence, we are in a position to define the n differences

di,l − di =(x

(i)l − x(i)

)−

m∑j=1

wj

(x

(j)l − x(j)

), l = 1, . . . , n . (12.22)

The empirical variances and covariances of the participants are

sx(i)x(j) ≡ sij =1

n − 1

n∑l=1

(x

(i)l − x(i)

)(x

(j)l − x(j)

); i, j = 1, . . . , m .

The sij define the empirical variance–covariance matrix

s =

⎛⎜⎜⎝

s11 s12 . . . s1m

s21 s22 . . . s2m

. . . . . . . . . . . .sm1 sm2 . . . smm

⎞⎟⎟⎠ ; sii ≡ s2

i .

For convenience, we introduce the auxiliary vector

w = (w1 w2 . . . wm)T

comprising the weights wi. Then, from (12.22) we obtain

s2di

=1

n − 1

n∑l=1

(di,l − di

)2 = s2i − 2

m∑j=1

wjsij + wTsw . (12.23)

The worst case estimation of (12.19) yields

fs,i = fs,x(i) + uβ(i) ; i = 1, . . . , m .

Hence, the uncertainties of the differences (12.18) are given by


Fig. 12.5. The absolute value of the difference di = T (i) − β should not exceed theuncertainty udi

udi=

tP (n − 1)√n

√√√√s2i − 2

m∑j=1

wjsij + wTsw

+ (1 − wi) fs,i +m∑

j=1j =i

wjfs,j , i = 1, . . . , m .

We are free to add to the second term on the right ±wifs,i,

(1 − wi) fs,i +m∑

j=1j =i

wjfs,j ± wifs,i = fs,i − 2wifs,i +m∑

j =1

wjfs,j ,

so that

udi=

tP (n − 1)√n

√√√√s2i − 2

m∑j=1

wjsij + wTsw

+fs,i − 2wifs,i +m∑

j=1

wjfs,j . (12.24)

As illustrated in Fig. 12.5, we should observe∣∣∣T (i) − β∣∣∣ ≤ udi

. (12.25)

In case of equal weights, we have to set wi = 1/m. Whether or not (12.25)turns out to be practically useful, appears somewhat questionable. Let usrecall that the grand mean β as defined in (12.17) is established through thecalibrations T (i) themselves. Consequently, we cannot rule out that just bychance, correct calibrations might appear to be incorrect and incorrect onesto appear correct.

Rather, it appears simpler and less risky to check whether the uncertain-ties of the T (i); i = 1, . . . , m mutually overlap. If they do, this would indicatecompatibility of the β(1), β(2), . . . , β(m).


Finally, let us refer to what metrologists call “traceability”. In a sense,this is the most basic definition of metrology stating the imperative: theuncertainty of every standard and every physical quantity should localizethe pertaining true value. Provided the alternative error model we refer toapplies, we are in a position to claim:

The alternative error model meets the requirements of traceability.In contrast to this, the ISO Guide presents us with difficulties of how

to explain the term and of how to realize its implications – even lengthydiscussions holding up for years do not seem to have clarified the situation.

12.2 Mass Decades

Since the First General Conference of Weights and Measures in 1889, theunit of mass is realized through artefacts, namely so-called prototypes of thekilogram.2 However, as thorough long-term investigations have revealed, massartefacts are not really time-constant quantities. Their inconstancies mightamount up to ±5 µg [15]. Another problem relates to the high density of theplatinum-iridium alloy. Their density of 21 500 kgm−3 has to be confrontedwith the density of 8 000kg m−3 of stainless steel weights. According to thisdifference, the primary link-up

prototype ⇒ 1 kg stainless steel weight

is affected by an air buoyancy of the order of magnitude of 100mg, theuncertainty of which should lie in an interval of ±5 µg up to ±10 µg.

The scaling of the kilogram, there are submultiples and multiples as

. . . , 10 g , 20 g , 50 g , 100 g , 200 g , 500 g , 1 kg , 2 kg , 5 kg , . . . (12.26)

is carried out through weighing. In the following, we shall consider the ideasunderlying mass link-ups. To adjust the weights to exact nominal valueswould be neither necessary nor, regarding the costs, desirable. Rather, differ-ences of the kind

βk = mk − N(mk) ; k = 1, 2, . . . (12.27)

between the physical masses mk and their nominal values N(mk) are consid-ered.

Weighing Schemes

For simplicity, we shall confine ourselves to “down-scaling”. Consider a set ofweights with nominal values

2The prototypes, whose nominal value is 1 kg (exact), are cylinders 39 mm inheight and 39 mm in diameter and consist of an alloy of 90% platinum and 10%iridium. At present, a new definition of the SI unit kg is being investigated.

12.2 Mass Decades 211

Fig. 12.6. Link-up standard with nominal value N , true value N0 and meanvalue N

N(m1) = 500 g , N(m2) = 200 g , N(m3) = 200 gN(m4) = 100 g , N(m5) = 100 g . (12.28)

In order to better control the weighing procedures, we might wish to add aso-called check-weight, for example

N(m6) = 300 g ,

the actual mass of which is known a priori. After the link-up, the two intervals

(m6 ± um6)before calibration (m6 ± um6)after calibration

should mutually overlap.According to Fig. 12.6, we assume that a standard of nominal value N =

1kg is provided, N0 being its physically true value, and N its known meanvalue with uncertainty uN ,

N − uN ≤ N0 ≤ N + uN ; N = N0 + fN ; −uN ≤ fN ≤ uN .

(12.29)

We shall always liken two groups of weights of equal nominal values, e.g.

m1 + m2 + m3 + m6 nominal value 1.2 kg

m4 + m5 + N0 nominal value 1.2 kg

1st comparison .

We refer to l = 1, . . . , n repeat measurements and represent the comparisonin the form of an observational equation

m1 + m2 + m3 + m6 − m4 + m5 + N0 ≈ x1 ; x1 =1n

n∑l=1

x1,l .


In physical terms, the link-up standard enters, of course, with its true valueN0. As this value is unknown, we have to substitute N for N0 and to increasethe uncertainty ux1 of the right-hand side by uN up to uN + ux1 ,

m1 + m2 + m3 + m6 −m4 + m5 + N

≈ x1

uright−hand side = uN + ux1 .

For convenience, we now turn to deviations as defined in (12.27),

β1 + β2 + β3 − β4 − β5 + β6 ≈ x1 + d . (12.30)

Here

d = N − N (12.31)

denotes the difference between N and N , obviously,

d0 = N0 − N (12.32)

denotes the true value of d. The associated error equation is

d = d0 + fN ; −uN ≤ fN ≤ uN . (12.33)

As we learn from (12.30), the first row of the design matrix A are numbers+1 or −1, depending on whether the weight is a member of the first or ofthe second group. Since there are also comparisons which do not include allweights of the set considered, e.g.

m6 nominal value 300 g

m3 + m5 nominal value 300 g

2nd comparison ,

i.e.

−β3 − β5 + β6 ≈ x2 ,

there will also be zeros. The first two rows of the design matrix A read

1 1 1 -1 -1 10 0 -1 0 -1 1 . (12.34)

If there are r unknowns βk; k = 1, . . . , r we need m > r observational equa-tions, allowing for rank(A) = r.

The exemplary weighing scheme presented below is made up of m = 16mass comparisons and r = 6 unknown weights. The first five comparisonsinclude the link-up standard N , which in the design matrix is indicated by a(+) sign. In general, N will appear within the first p observational equationsso that

12.2 Mass Decades 213

d = (d . . . d︸︷︷︸p

| 0 . . . 0︸︷︷︸m−p

)T . (12.35)

Furthermore, 14 comparisons include the check-weight. Setting

β = (β1 β2 . . . βr)T

, x = (x1 x2 . . . xm)T ,

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 −1 −1 11 −1 1 1 1 11 1 −1 1 1 11 1 0 1 −1 11 0 1 −1 1 11 −1 1 −1 −1 −11 −1 −1 1 1 −11 1 −1 −1 −1 −11 −1 0 1 −1 −11 0 −1 −1 1 −10 −1 0 −1 0 10 −1 0 −1 0 10 0 −1 0 −1 10 0 −1 0 −1 10 0 0 1 −1 00 0 0 1 −1 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(+)(+)(+)(+)(+)

the inconsistent linear system takes the form

Aβ ≈ x + d . (12.36)


β =(ATA

)−1AT(x + d

)= BT

(x + d

)B = A

(ATA

)−1= (bik) (12.37)


βk =m∑

i=1

bikxi + d

p∑i=1

bik ; k = 1, . . . , r . (12.38)

Inserting the error equations

xi = x0,i + (xi − µi) + fi , d = d0 + fN

we find

βk =m∑

i=1

bik [x0,i + (xi − µi) + fi] + (d0 + fN)p∑

i=1

bik ; k = 1, . . . , r .


Obviously,

β0,k =m∑

i=1

bikx0,i + d0

p∑i=1

bik (12.39)

are the true values of the estimators βk. The propagated systematic errorsturn out to be

fs,βk=

m∑i=1

|bik| fs,i + uN

∣∣∣∣∣p∑

i=1

bik

∣∣∣∣∣ ; k = 1, . . . , r . (12.40)

Going back to individual measurements, (12.38) leads us to

βkl =m∑

i=1

bikxil + d

p∑i=1

bik ; k = 1, . . . , r . (12.41)

By means of the differences

βkl − βk =m∑

i=1

bik (xil − xi) ; k = 1, . . . , r ; l = 1, . . . , n

we arrive at the empirical variances and covariances of the components of thesolution vector,

sβkβk′ =1

n − 1

n∑l=1

(βkl − βk

) (βk′l − βk′

)

=1

n − 1

n∑l=1

[m∑

i=1

bik (xil − xi)

]⎡⎣ m∑j=1

bjk′ (xj l − xj)

⎤⎦

=m∑i,j

bikbjk′sij ; sβkβk≡ s2

βk. (12.42)

Here, the elements

sij =1

n − 1

n∑l=1

(xil − xi) (xj l − xj) ; i, j = 1, . . . , m (12.43)

designate the empirical variances and covariances of the input data. After all,we shall assign uncertainties

uβk=

tP (n − 1)√n

sβk+ fs,βk

; k = 1, . . . , r (12.44)

=tP (n − 1)√

n

√√√√ m∑i,j

bikbjksij +m∑

i=1

|bik| fs,i + fs,N

p∑i=1

|bik|

to the r adjusted weights βk; k = 1, . . . , r of the set.

12.4 Fundamental Constants of Physics 215

12.3 Pairwise Comparisons

Occasionally, within a set of physical quantities of the same quality (lengths,voltages, etc.) only differences between two individuals are measurable. Con-sider, for example, four quantities βk; k = 1, . . . , 4. Then, there are either sixor twelve measurable differences, depending on whether not only the differ-ences

(βi − βj) but also the inversions (βj − βi)

appear to be of metrological relevance. Let us refer to the first, simpler situ-ation,

β1 − β2 ≈ x1

β1 − β3 ≈ x2

β1 − β4 ≈ x3

β2 − β3 ≈ x4

β2 − β4 ≈ x5

β3 − β4 ≈ x6 . (12.45)

Obviously, the first and the second relation establish the fourth one, the firstand the third the fifth one and, finally, the second and the third the sixthone. Consequently, the design matrix

A =

⎛⎜⎜⎜⎜⎜⎜⎝

1 −1 0 01 0 −1 01 0 0 −10 1 −1 00 1 0 −10 0 1 −1

⎞⎟⎟⎟⎟⎟⎟⎠ (12.46)

has rank 3. As has been outlined in Sect. 7.3, a constraint is needed. Let use.g. consider

h1β1 + h2β2 + h3β3 + h4β4 = y . (12.47)

The uncertainties of the βk; k = 1, . . . , 4 follow from Chap. 9. Of course, theright hand side of (12.47) may also refer to quantities of the kind y ± uy.

12.4 Fundamental Constants of Physics

Among others, the following physical constants are at present consideredfundamental:


h Planck constant R∞ Rydberg constante elementary charge µB Bohr magnetonc speed of light in vacuum µN nuclear magnetonµ0 permeability of vacuum me electron massε0 permittitivity of vacuum mp proton massα fine-structure constant Mp proton molar massγp proton gyromagnetic ratio µp proton magnetic momentNA Avogadro constant F Faraday constant

Due to measurement errors, fundamental constants taken from measured dataprove mutually inconsistent, i.e. different concatenations which we expect toproduce equal results, disclose numerical discrepancies. To circumvent thismishap, a sufficiently large set of constants, say r, is subjected to a leastsquares adjustment calling for mutual consistency. Just as the uncertaintiesof the input data are expected to localize the pertaining true values, werequire that the adjusted constants should meet the respective demand, i.e.each of the newly obtained intervals,

(constant ± uncertainty)k , k = 1, . . . , r

should localize the pertaining true value. The latter property should applyirrespective of the weighting procedure, as has been discussed in Sects. 8.4and 9.3.

Examples of concatenations between fundamental constants are

R∞ =me

2hcα2 , α =

µ0c

2e2

h, µB =

eh

4πme, µN =

eh

4πmp

Mp = NAmp , γp =(

µp

µB

)(mp

me

)1

MpNA e , ε0µ0 =

1c2

, F = NAe .

Those constants which are not members of the adjustment, will be derivedfrom the adjusted ones so that, as the bottom line, a complete system ofnumerically self-consistent constants is obtained. Two remarks may be inorder:

– The smaller the set of primary constants, the larger the risk of an overallmaladjustment.

– The adjusted constants are not exact but rather self-consistent, i.e. theyno longer give rise to numerical discrepancies.3 We expect the uncertain-ties as issued by the adjustment to localize true values, Fig. 12.7

Constants may enter the adjustment either directly as measured values

Ki = measured value (i) , i = 1, 2, . . . (12.48)3I.e. the adjustment establishes consistency through redistribution of discrep-

ancies.


Fig. 12.7. Consider a set of three fundamental constants β1, β2 and β3 and scru-tinize the interplay between weightings and the capability of the system to localizetrue values


or in the form of given concatenations

φi (K1, K2, . . .) = measured value (i) , i = 1, 2, . . . (12.49)

In the latter case, the functions φi, i = 1, 2, . . . have to be linearized bymeans of suitable starting values. The presently accepted form of adjustmentis presented in [17]. Here, however, the error model recommended in [34]is used. As the latter sets aside the introduction of true values, at leastin the opinion of the author, collective shifts of the numerical levels of thefundamental physical constants4 are conceivable.

In the following, we shall outline the principles of an alternative adjust-ment [28].

Linearizing the System of Equations

Given m functions

φi (K1, K2, . . . , Kr) ≈ aiyi ; i = 1, . . . , m (12.50)

between r < m constants Kk; k = 1, . . . , r to be adjusted. Not every con-stant will enter each functional relationship. Let the ai be numerically knownquantities the details of which will not concern us here. Furthermore, let theinput data

yi =mi∑j=1

w(i)j x

(i)j ; i = 1, . . . , m (12.51)

be weighted means, each defined through mi means5 xj ; j = 1, . . . , mi. Thequotations

x(i)j ± u

(i)j (12.52)

x(i)j =

1n

n∑l=1

x(i)j l , u

(i)j =

tP (n − 1)√n

s(i)j + f

(i)s,j ; j = 1, . . . , mi

are presented by the different working groups participating in the adjustment.The pertaining error equations

x(i)j l = x

(i)0 +

(x

(i)jl − µ

(i)j

)+ f

(i)j

x(i)j = x

(i)0 +

(x

(i)j − µ

(i)j

)+ f

(i)j

−f(i)s,j ≤ f

(i)j ≤ f

(i)s,j ; j = 1, . . . , mi (12.53)

4http://www.nist.gov5See Sect. 9.3 where the precondition for the existence of metrologically reason-

able mean of means has been extensively discussed.


lead us to the empirical variances and covariances of the i-th functional re-lationship,

s(i) =(s(i)jj′

)(12.54)

s(i)jj′ =

1n − 1

n∑l=1

(x

(i)j l − x

(i)j

)(x

(i)j′l − x

(i)j′

); s

(i)jj ≡ s

(i)j

2.

The weights

w(i)j =

g(i)2j

mi∑j ′=1

g(i)2j′

; g(i)j =

1

u(i)j

; j = 1, . . . , mi

and the uncertainties

uyi =tP (n − 1)√

n

√√√√mi∑j,j′

w(i)j w

(i)j′ s

(i)jj′ +

mi∑j

w(i)j f

(i)s,j (12.55)

have been quoted in Sect. 9.3.Expanding the φi by means of known starting values Kv,k; k = 1, . . . , r

yields

φi (K1, . . . , Kr) = φi, v (Kv,1, . . . , Kv, r)

+∂φi

∂Kv,1(K1 − Kv,1) + · · · + ∂φi

∂Kv,r(Kr − Kv,r) ≈ aiyi .

Truncating non-linear terms and rearranging leads to

1φi,v

∂φi

∂Kv,1(K1 − Kv,1) + · · · + 1

φi,v

∂φi

∂Kv,r(Kr − Kv,r) ≈ aiyi − φi,v

φi,v,

where

φi,v ≡ φi,v (Kv,1, . . . , Kv,r) .

A reasonable weighting matrix is

G = diag g1 , g2 , . . . , gm ; gi =1∣∣∣ ai

φi,v

∣∣∣ uyi

, (12.56)

leading to

gi

φi, v

∂φi

∂Kv,1(K1 − Kv,1) + · · · + gi

φi,v

∂φi

∂Kv,r(Kr − Kv,r) ≈ gi

aiyi − φi,v

φi,v

i = 1 . . . , m . (12.57)


To abbreviate, we put

Kk − Kv,k = Kv,kβk ; Kv,kgi

φi,v

∂φi

∂Kv,k= aik ; zi = gi

aiyi − φi,v

φi,v

(12.58)


ai1β1 + · · · + airβr ≈ zi ; i = 1, . . . , m . (12.59)

Finally, defining

A = (aik) , β = (β1 β2 . . . βr)T and z = (z1 z2 . . . zm)T

we arrive at

Aβ ≈ z . (12.60)

Here, the ≈ sign implies both, linearization and measurement errors.

Structure of the Solution Vector

From (12.60) we obtain

β = BTz ; B = A(ATA

)−1, (12.61)


βk =m∑

i=1

bikzi ; k = 1, . . . , r .

Inserting the zi as given in (12.58) yields

βk =m∑

i=1

bikzi =m∑

i=1

bikgiaiyi − φi, v

φi,v=

m∑i=1

bikgiaiyi

φi,v−

m∑i=1

bikgi .

We then introduce the yi according to (12.51),

βk =m∑

i=1

bikaigi

φi,v

mi∑j=1

w(i)j x

(i)j −

m∑i=1

bikgi , (12.62)

and, finally, the x(i)j as given in (12.53),

βk =m∑

i=1

bikaigi

φi,v

mi∑j=1

w(i)j

[x

(i)0 +

(x

(i)j − µ

(i)j

)+ f

(i)j

]−

m∑i=1

bikgi

=m∑

i=1

bikaigi

φi,vx

(i)0 −

m∑i=1

bikgi

+m∑

i=1

bikaigi

φi,v

mi∑j=1

w(i)j

[(x

(i)j − µ

(i)j

)+ f

(i)j

]. (12.63)


Obviously,

β0,k =m∑

i=1

bikaigi

φi,vx

(i)0 −

m∑i=1

bikgi =m∑

i=1

bikgiaix

(i)0 − φi,v

φi,v

k = 1 , . . . , r (12.64)

designate the true values of the estimators βk within the limits of vanishinglinearization errors so that

βk = β0,k +m∑

i=1

bikaigi

φi,v

mi∑j=1

w(i)j

[(x

(i)j − µ

(i)j

)+ f

(i)j

]. (12.65)

From this we obtain the propagated systematic errors

fβk=

m∑i=1

bikaigi

φi,v

mi∑j=1

w(i)j f

(i)j ; k = 1, . . . , r . (12.66)

To simplify the handling, we alter the nomenclature and transfer the doublesum (12.62) into an ordinary sum,

βk = b1ka1g1φ1,v

[w

(1)1 x

(1)1 + w

(1)2 x

(1)2 + . . . + w

(1)m1 x

(1)m1

]+b2k

a2g2φ2,v

[w

(2)1 x

(2)1 + w

(2)2 x

(2)2 + . . . + w

(2)m2 x

(2)m2

]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

+bmkamgm

φm,v

[w

(m)1 x

(m)1 + w

(m)2 xm

2 + . . . + w(m)mm x

(m)mm

]−

m∑i=1

bikgi

so that

βk =M∑i=1

bikvi −m∑

i=1

bikgi (12.67)

putting

M =m∑

i=1

mi (12.68)

and

i = 1 : v1 = x(1)1 v2 = x

(1)2 · · · vm1 = x

(1)m1

i = 2 : vm1+1 = x(2)1 vm1+2 = x

(2)2 · · · vm1+m2 = x

(2)m2

· · · · · · · · · · · · · · ·i = m : vq+1 = x

(m)1 vq+2 = x

(m)2 · · · vM = x

(m)mm


i = 1 : b1k = b1ka1g1

φ1,vw

(1)1 ,

b2k = b1ka1g1

φ1,vw

(1)2 ,

. . . . . . . . . . . . . . .

bm1,k = b1ka1g1

φ1,vw(1)

m1

i = 2 : bm1+1,k = b2ka2g2

φ2,vw

(2)1 ,

bm1+2,k = b2ka2g2

φ2,vw

(2)2 ,

. . . . . . . . . . . . . . .

bm1+m2,k = b2ka2g2

φ2,vw(2)

m2

. . . . . . . . . . . . . . . . . .

i = m : bq+1,k = bmkamgm

φm,vw

(m)1 ,

bq+2,k = bmkamgm

φm,vw

(m)2 ,

. . . . . . . . . . . . . . .

bM,k = bmkamgm

φm,vw(m)

mm

where

q =m−1∑j=1

mj .


βk =m∑

i=1

bikaigi

φi,v

mi∑j=1

w(i)j

1n

n∑l=1

x(i)jl −

m∑i=1

bikgi

=1n

n∑l=1

⎡⎣ m∑

i =1

mi∑j=1

bikaigi

φi,vw

(i)j x

(i)jl

⎤⎦−

m∑i=1

bikgi (12.69)

so that

βk =m∑

i=1

βkl ; k = 1, . . . , r (12.70)


understanding the

βkl =m∑

i=1

mi∑j=1

bikaigi

φi,vw

(i)j x

(i)jl −

m∑i=1

bikgi ;

k = 1, . . . , r ; l = 1, . . . , n (12.71)

as estimators of an adjustment based on the respective l-th repeat measure-ments. The n differences between (12.71) and (12.62),

βkl − βk =m∑

i=1

mi∑j=1

bikaigi

φi,vw

(i)j

(x

(i)jl − x

(i)j

);

k = 1, . . . , r ; l = 1, . . . , n (12.72)

give rise to the elements of the empirical variance–covariance matrix of thesolution vector. To have a concise notation, we introduce

i = 1 : v1l = x(1)1l v2l = x

(1)2l · · · vm1,l = x

(1)m1,l

i = 2 : vm1+1,l = x(2)1l vm1+2,l = x

(2)2l · · · vm1+m2,l = x

(2)m2,l

· · · · · · · · · · · · · · ·i = m : vq+1,l = x

(m)1l vq+2,l = x

(m)2l · · · vM,l = x

(m)mm,l

so that

βkl − βk =M∑i=1

bik (vil − vi) ; k = 1, . . . , r ; l = 1, . . . , n . (12.73)

Further, we define an auxiliary matrix assembling the bik,

B =(bik

). (12.74)

Then, referring to the empirical variance–covariance matrix of all input data,

s = (sij) ; sij =1

n − 1

n∑l=1


i, j = 1, . . . , M , (12.75)

the empirical variance–covariance matrix of the solution vector β turns outto be

sβ = BTsB . (12.76)

Its elements are


sβkβ′k

=1

n − 1

n∑l=1

(βkl − βk

) (βk′l − βk′

)sβkβk

≡ s2βk

; k, k′ = 1, . . . , r . (12.77)

To concisely present the systematic errors, we set

i = 1 : f1 = f(1)1 f2 = f

(1)2 · · · fm1 = f

(1)m1

i = 2 : fm1+1 = f(2)1 fm1+2 = f

(2)2 · · · fm1+m2 = f

(2)m2

· · · · · · · · · · · · · · ·i = m : fq+1 = f

(m)1 fq+2 = f

(m)2 · · · fM = f

(m)mm

so that (12.66) becomes

fβk=

M∑i=1

bikfi ; k = 1, . . . , r . (12.78)

Finally, the result of the adjustments is

βk ± uβk

uβk=

tP (n − 1)√n

sβk+

M∑i=1

∣∣∣bik

∣∣∣ fs,i ; k = 1, . . . , r . (12.79)

Uncertainties of the Constants

According to (12.58), the constants and their uncertainties turn out to be

Kk = Kk,v

(βk + 1

), uKk

= Kk,vuβk; k = 1, . . . , r . (12.80)

Cyclically repeating the adjustment with improved starting values shouldreduce the linearization errors. However, on account of the errors of the inputdata, the estimators βk; k = 1, . . . , r tend to oscillate after a certain numberof cycles, telling us to end the iteration.

12.5 Trigonometric Polynomials

Let us look for a smoothing trigonometric polynomial to be fitted to a givenempirical relationship between any two measured quantities. At first, we shallassume the model function to be unknown. After this, we shall consider themodel function to be known – yet to simplify, we wish to substitute a trigono-metric polynomial presuming the difference between the two functions to benegligible as compared with the error bounds of the empirical data.

For simplicity we confine ourselves to the case of error-free abscissas anderroneous ordinates,

12.5 Trigonometric Polynomials 225

(x0,1, y1) , (x0,2, y2) , . . . (x0,m, ym) , (12.81)

where we assume

yi ± uyi ; i = 1, . . . , m (12.82)

yi =1n

n∑l=1

yil = y0,i + (yi − µi) + fyi , uyi =tP (n − 1)√

nsyi + fs,yi .

12.5.1 Unknown Model Function

We require the fitted function to intersect each of the error bars of the givenpairs of data. To implement this demand, we are watching out for a well-behaving, bendable trigonometric polynomial. Let us keep in mind that ifthe polynomial’s degree is too low, it will not be in a position to adaptitself to the details suggested by the sequence of data. If it is too high, thepolynomial might follow too closely the data’s pseudo-structures. Fortunately,the number of terms of a trigonometric polynomial is not overly critical.Hence, we may fix its degree by trial and error so that the fitted polynomialpasses through each of the error bars and, at the same time, appropriatelysmoothes the empirical structure of the given data sequence.

We suppose the number of data pairs to be sufficiently large and closelyspaced so that the adjustment of an empirical function appears to be physi-cally reasonable. We are then asking for a smoothing, trigonometric polyno-mial

y(x) = β1 +r∑

k=2

βktk(x) ; r = 3, 5, 7, . . . < m

tk(x) =

⎧⎪⎪⎨⎪⎪⎩

cos(

k

2ω0x

); k even

sin(

k − 12

ω0x

); k odd

k ≥ 2

which we intend to adjust according to least squares. To find the coefficientsβk; k = 1, . . . , r over the interval

∆X = x0,m − x0,1 , (12.83)

we define

tik =

⎧⎪⎪⎨⎪⎪⎩

cos(

k

2ω0x0,i

); k even

sin(

k − 12

ω0x0,i

); k odd

k ≥ 2 , i = 1 . . . , m . (12.84)

As is well known, trigonometric polynomials (and Fourier series) are basedon auxiliary frequencies which are multiples of a basic frequency


ω0 =2π

∆X

their purpose being to keep the least squares adjustment linear.6 After all,the inconsistent system is given by

β1 + β2ti2 + · · · + βrtir ≈ yi ; i = 1, . . . , m

r = 3, 5, 7, . . . < m . (12.85)

The design matrix

A =

⎛⎜⎜⎝

1 t12 t13 · · · t1r

1 t22 t23 · · · t2r

· · · · · · · · · · · · · · ·1 tm2 tm3 · · · tmr

⎞⎟⎟⎠

and the column vectors

y = (y1 y2 · · · ym)T , β = (β1 β2 · · · βr)T

transfer (12.85) into

Aβ ≈ y . (12.86)

Assuming rank (A) = r we have

β = BTy ; B = A(ATA

)−1= (bik) . (12.87)

The components

βk =m∑

i=1

bikyi ; k = 1, . . . , r (12.88)

of the solution vector β establish the adjusted polynomial

y(x) = β1 +r∑

k=2

βktk(x) ; r = 3, 5, 7, . . . < m . (12.89)

Since the model function is unknown, all we can do is to argue as follows:The error bars yi±uyi; i = 1, . . . , m localize the true values y0,i of the meansyi. And, as the trigonometric polynomial shall pass through each of the errorbars, we have

|y(x0,i) − y0,i| ≤ 2uyi ; i = 1, . . . , m . (12.90)

6This suggests we also adjust the frequencies, but we would then be faced witha highly non-linear problem to which, in general, there is no solution. Remarkablyenough, G. Prony, a contemporary of J. Fourier, developed a true harmonic analysis.


12.5.2 Known Model Function

Provided the difference between the known model function and the approx-imating trigonometric polynomial is insignificant, we may formally definea true polynomial over the interval (12.83) via the true values (x0,i, y0,i);i = 1, . . . , m of the data pairs (12.81),

y0(x) = β0,1 +r∑

k=2

β0,ktk(x) ; r = 3, 5, 7, . . . < m . (12.91)

Obviously, (12.91) expresses equality in a formal sense. Out of necessity, wehave to keep the adjusted polynomial (12.89). Assigning the yi; i = 1, . . . , mgiven in (12.82) to a vector

y = y0 + (y − µy) + fy ,

(12.87) yields

β = BT [y0 + (y − µy) + fy] = β0 + BT (y − µy) + BTfy . (12.92)

Here, the “true solution vector”

β0 = (β0,1 β0,2 · · · β0,r)T

provides a suitable reference to assess the fluctuations of (12.89). The vector

f β = BTfy ; fβk=

m∑i=1

bikfyi ; k = 1, . . . , r (12.93)

bears the influence of the systematic errors with respect to β0. Let us as-sume that on each of the ordinates, one and the same systematic error issuperimposed

fyi = fy , −fs,y ≤ fyi ≤ fs,y ; i = 1, . . . , m .

Then, the fβk; k = 1, . . . , r have to satisfy

fβk=

m∑i=1

bikfyi = fy

m∑i=1

bik ,

i.e.

fβ1= fy and fβ2

= . . . = fβr= 0 (12.94)

asm∑

i=1

bi1 = 1 andm∑

i=1

bik = 0 ; k = 2, . . . , r .


The polynomial coefficients βkl of the l-th set of repeat measurements

y1l, y2l, · · · , yml ; l = 1, . . . , n

may be taken from

βk =m∑

i=1

bik

[1n

n∑l=1

yil

]=

1n

n∑l=1

m∑i=1

bikyil

=1n

n∑l=1

βkl ; k = 1, . . . , r , (12.95)

i.e.

βkl =m∑

i=1

bikyil ; k = 1, . . . , r ; l = 1, . . . , n . (12.96)

The differences

βkl − βk =m∑

i=1

bik (yil − yi) ; k = 1, . . . , r ; l = 1, . . . , n (12.97)

give rise to the elements sβkβk′ ; k, k′ = 1, . . . , r of the empirical variance–covariance matrix sβ of the solution vector β,

sβkβk′ =1

n − 1

n∑l=1

(βkl − βk

) (βk′l − βk′

)

=1

n − 1

n∑l=1

[m∑

i=1

bik (yil − yi)

]⎡⎣ m∑j=1

bjk′ (yjl − yj)

⎤⎦

=m∑

i,j=1

bikbjk′sij ; sβkβk≡ s2

βk. (12.98)

Here, the

sij =1

n − 1

n∑l=1

(yil − yi) (yjl − yj)

are the empirical variances and covariances of the input data. Letting s =(sij) designate the associated empirical variance–covariance matrix, (12.98)changes into

sβ =(sβkβk′

)= BTsB . (12.99)

Finally, we find


βk ± uβk; k = 1, . . . , r (12.100)

uβ1=

tP (n − 1)√n

sβ1+ fs,y ; uβk

=tP (n − 1)√

nsβk

; k = 2, . . . , r .

The intervals βk ± uβk; k = 1, . . . , r localize the auxiliary quantities β0,k.

Uncertainty Band

The l-th set of input data produces the least squares polynomial

yl(x) = β1l + β2lt2(x) + · · · + βrltr(x) ; l = 1, . . . , n . (12.101)

According to (12.89), we have

y(x) =1n

n∑l=1

yl(x) . (12.102)

The empirical variance at the point x is given by

s2y(x) =

1n − 1

n∑l=1

(yl(x) − y(x))2 . (12.103)

Within the differences

yl(x) − y(x) =(β1l − β1

)+(β2l − β2

)t2(x) + · · · + (βrl − βr

)tr(x)

=m∑

i=1

[bi1 + bi2t2(x) + · · · + birtr(x)] (yil − yi) ; l = 1, . . . , n

we assign auxiliary constants

ci(x) = bi1 + bi2t2(x) + · · · birtr(x) ; i = 1, . . . , m (12.104)

to the square brackets. Defining the auxiliary vector c = (c1 c2 · · · cm)T theempirical variance s2

y(x) turns into

s2y(x) = cTsc . (12.105)

Finally, the uncertainty band which is a measure of the fluctuations of theleast squares polynomial with respect to the formally introduced true poly-nomial (12.91) is given by

y(x) ± uy(x) ; uy(x)tP (n − 1)√

nsy(x) + fs,y . (12.106)


Fig. 12.8. Approximation of a given inharmonic polynomial based on five frequen-cies through a harmonic trigonometric least squares polynomial requiring about16 frequencies. Data: correct abscissas, erroneous ordinates given in the form ofarithmetic means

Example

Adjustment of a harmonic trigonometric polynomial.We simplify our preconditions and assume the existence of a true known

inharmonic trigonometric polynomial of the form

y0(x) = a0 +N∑

k=1

ak cosωkt + bk sin ωkt

based on N = 5 frequencies. We superimpose normally distributed randomerrors and a fixed systematic error, satisfying fyi = fy; −fs,y ≤ fyi ≤ fs,y;i = 1, . . . , m on the ordinates of the given polynomial.

Figure 12.8 depicts the fitted harmonic trigonometric least squares poly-nomial and its uncertainty band.

While the given inharmonic polynomial y0(x) gets along with just N = 5frequencies ω1, . . . , ω5, the harmonic trigonometric least squares polynomialcalls for an r of the order of magnitude 33, i.e. about 16 harmonic frequencies.

Part IV

Appendices

A On the Rank of Some Matrices

A matrix A is said to have rank r if there is at least one non-zero sub-determinant of order (r × r) while all sub-determinants of higher order arezero. The notation is rank (A) = r.

The so-called design matrix A, referred to in the method of least squares,Sect. 7.2, has m rows and r columns, m > r. Given A has rank r, its rcolumns are linearly independent.

It appears advantageous to regard the columns of a given matrix as vec-tors. Obviously, A has r m-dimensional column vectors. We denote the latterby ak; k = 1, . . . , r. If the ak are independent, they span an r-dimensionalsubspace R(A) = V r

m() of the underlying m-dimensional space Vm(); denotes the set of real numbers.

1. ATA, Sect. 7.2

Let rank (A) = r and let the ak; k = 1, . . . , r be a basis of an r-dimensionalsubspace R(A) = V r

m() of the m-dimensional space Vm() considered. Asthe ak are independent, it is not possible to find a vector

β = (β1 β2 · · · βr)T = 0 ,

so that

Aβ = β1a1 + β2a2 + · · · + βrar = 0 .

The vector

c = β1a1 + β2a2 + · · · + βrar

lies in the column space of the matrix A. Consequently, the vector ATc mustbe different from the zero vector, as c cannot be orthogonal to all vectors ak;k = 1, . . . , r. But, if for any vector β = 0

ATAβ = 0 ,

the columns of ATA are independent, i.e. rank (A) = r. Moreover, as(Aβ)TAβ equals the squared length of the vector Aβ, which is nonzeroif β is nonzero, we have

βT(ATA)β > 0

so that the symmetric matrix ATA is also positive definite.

234 A On the Rank of Some Matrices

2. B = A(ATA)−1, Sect. 7.2

As (ATA)−1 is non-singular, the rank of B is that of A.

3. P = A(ATA)−1AT, Sect. 7.2

The rank of an idempotent matrix (P 2 = P ) is equal to its trace [53]. Giventhat the product of two matrices, say R and S, is defined in either way, RSand SR, we have trace (RS) = trace(SR), i.e.

rank(P ) = trace(P )

= trace[A(ATA

)−1AT]

= trace[ATA

(ATA

)−1]

= r .

4. C = A(ATA)−1HT, Sect. 7.3

Each of the columns of C is a linear combination of the columns of B =A(ATA)−1. As rank (H) = q, the q columns of C are independent.

5. BTsB, Sect. 9.5

Since we consider positive definite (non-singular) empirical variance–covari-ance matrices s only, we are allowed to resort to the decomposition s = cTc;c denoting a non-singular (m × m) matrix [52]. As BTsB = (c B)T(c B),where rank (c B) = r, we have rank (BTs B) = r. Moreover, the symmetricmatrix BTs B is positive definite.

B Variance–Covariance Matrices

We consider m random variables Xi, µi = EXi; i = 1, . . . , m and m realnumbers ξi; i = 1, . . . , m. The expectation

E

⎧⎨⎩[

m∑i=1

ξi (Xi − µi)

]2⎫⎬⎭ ≥ 0 (B.1)

yields the quadratic form

ξTσ ξ ≥ 0 , (B.2)

where

ξ = (ξ1 ξ2 . . . ξm)T (B.3)

and

σ =

⎛⎜⎜⎝

σ11 σ12 . . . σ1m

σ21 σ22 . . . σ2m

. . . . . . . . . . . .σm1 σm2 . . . σmm

⎞⎟⎟⎠

σij = E (Xi − µi) (Xj − µj) ; i, j = 1, . . . , m . (B.4)

From (B.2) we seize: Variance–covariance matrices are at least positive semi-definite.

As the variance–covariance matrix is real and symmetric there is an or-thogonal transformation for diagonalization. This in turns allows for

|σ| = λ1 · λ2 · . . . · λm , (B.5)

the λi; i = 1, . . . , m denoting the eigenvalues of σ. If rank (σ) = m, necessar-ily all the eigenvalues λi; i = 1, . . . , m are different from zero. On the otherhand, given σ is positive definite

ξTσ ξ > 0 , (B.6)

the theory of quadratic forms tells us

236 B Variance–Covariance Matrices

λi > 0 ; i = 1, . . . , m (B.7)

so that

|σ| > 0 . (B.8)

This in the end implies: Given the the variance–covariance matrix has fullrank, it is positive definite.

If (B.6) does not apply, the variance–covariance matrix σ is positive semi-definite, rank (σ) < m and some of the eigenvalues are zero, those beingdifferent from zero are still positive. We then have

|σ| = 0 . (B.9)

Let us consider m = 2 and substitute σxy for σ12; furthermore σxx ≡ σ2x and

σyy ≡ σ2y for σ11 and σ22 respectively. Then, from

σ =(

σxx σxy

σyx σyy

), given |σ| > 0

we deduce

−σxσy < σxy < σxσy .

Let us now switch from expectations and theoretical variance–covariance ma-trices to sums and empirical variance–covariance matrices. To this end, weconsider m arithmetic means

xi =1n

n∑l=1

xil ; i = 1, . . . , m , (B.10)

each covering n repeat measurements xil; i = 1, . . . , m; l = 1, . . . , n. Thenon-negative sum

1n − 1

n∑l=1

[m∑

i=1

ξi (xil − xi)

]2≥ 0 (B.11)

yields

ξTsξ ≥ 0 , (B.12)

where

s =

⎛⎜⎜⎝

s11 s12 . . . s1m

s21 s22 . . . s2m

. . . . . . . . . . . .sm1 sm2 . . . smm

⎞⎟⎟⎠

sij =1

n − 1

n∑l=1

(xil − xi) (xj l − xj) ; i, j = 1, . . . , m (B.13)

B Variance–Covariance Matrices 237

denotes the empirical variance–covariance matrix of the m data sets xil; i =1, . . . , m; l = 1, . . . , n.

Given rank (s) = m, the matrix s is positive definite,

ξTsξ > 0 , |s| > 0 , (B.14)

but for rank (s) < m, the matrix s is positive semi-definite,

ξTsξ ≥ 0 , |s| = 0 . (B.15)

Again, we set m = 2 and substitute sxy for s12, sxx = s2x and syy = s2

y fors11 and s22 respectively. Then, from

s =(

sxx sxy

syx syy

), provided |s| > 0 ,

we obtain

−sxsy < sxy < sxsy . (B.16)

C Linear Functions of Normal Variables

We confine ourselves to sums of two (possibly dependent) normally dis-tributed random variables, X and Y , and refer to the nomenclature intro-duced in Sect. 3.2, i.e. we consider

Z = bxX + byY (C.1)

and denote by µx = EX, µy = EY the expectations and by σxx =E(X − µx)2, σyy = E(Y − µy)2, σxy = E(X − µx)(Y − µy) the theo-retical moments of second order. As usual, we associate the variables X andY with the quantities to be measured, x and y. Defining

ζ = (x y)T , µζ = (µx µy)T , b = (bx by)T

, σ =(

σxx σxy

σyx σyy

),

we arrive at

z = bTζ , µz = E Z = bTµζ , σ2z = E

(Z − µz)

2

= bTσ b .

We prove that Z is N(µz, σ2z) distributed. To this end we refer to the tech-

nique of characteristic functions. Given a random variable X , its character-istic function is defined by [43]

χ(v) = EeivX

=

∞∫−∞

pX(x)eivx dx . (C.2)

If X is N(µx, σ2x) distributed, we have

χ(v) = exp(

iµxv − σ2xv2

2

). (C.3)

Obviously, the Fourier transform of χ(v) reproduces the distribution densitypX(x).

We consider the characteristic function of the random variable Z. Follow-ing (4.5) we may put

240 C Linear Functions of Normal Variables

EeivZ

=

∞∫−∞

eivzpZ(z) dz

=

∞∫−∞

∞∫−∞

exp [iv(bxx + byy)] p (x, y) dxdy , (C.4)

the density pXY (x, y) being given in (3.17),

pXY (x, y) =1

2π|σ|1/2exp[−1

2(ζ − µζ)T σ−1 (ζ − µζ)

].

Adding ±ivbTµζ to the exponent of (C.4), we form the identity

iv bTζ − 12

(ζ − µζ)T σ−1 (ζ − µζ) ± iv bTµζ

= −12

(ζ − µ∗)T σ−1 (ζ − µ∗) + iv bTµζ − v2

2bTσ b , (C.5)

in which

µ∗ = µζ + ivvecσ .

Finally, (C.4) turns into

EeivZ

= exp(

ibTµζv − bTσ bv2

2

)

= exp(

iµzv − σ2z v2

2

). (C.6)

A comparison of (C.3) and (C.6) reveals that Z is normally distributed.

D Orthogonal Projections

Let Vm() denote an m-dimensional vector space. Furthermore, let the r < mlinearly independent vectors

a1 =

⎛⎜⎜⎝

a11

a21

. . .am1

⎞⎟⎟⎠ , a2 =

⎛⎜⎜⎝

a12

a22

. . .am2

⎞⎟⎟⎠ , . . . , ar =

⎛⎜⎜⎝

a1r

a2r

. . .amr

⎞⎟⎟⎠

define an r-dimensional subspace V rm() ⊂ Vm(). We assemble the r vectors

within a matrix

A = (a1 a2 · · · ar)

so that V rm() coincides with the column space R(A) of the matrix A. The

(m − r)-dimensional null space N(AT) = V m−rm () of the matrix AT is or-

thogonal to R(A). The set of linear independent vectors r1, . . . , rm−r, span-ning N(AT), is defined through the linear system

ATr = 0 . (D.1)

Each vector lying within R(A) is orthogonal to each vector lying withinN(AT) and vice versa: Vm() is the direct sum of the subspaces R(A) andN(AT),

Vm () = N(AT)⊕ R (A) .

Each vector y being an element of R(A) may be uniquely decomposed intoa linear combination of the vectors ak; k = 1, . . . , r,

Aβ = y . (D.2)

The r components βk; k = 1, . . . , r of the vector

β = (β1 β2 . . . βr)T

may be seen as coefficients or “stretch factors” attributing to the vectors ak;k = 1, . . . , r suitable lengths so that the linear combination

242 D Orthogonal Projections

Fig. D.1. The r column vectors a1, a2, . . . , ar span an r-dimensional subspaceR(A) = V r

m() of the m-dimensional space Vm(). The vector of the residuals,r ∈ N(AT), N(AT) ⊥ R(A), is orthogonal to the vectors a1, a2 . . . , ar

β1a1 + β2a2 + · · · + βrar

defines, as desired, the vector y.Clearly, any vector x ∈ Vm() may only be represented through a linear

combination of the vectors ak; k = 1, . . . , r if it happens that x lies in R(A) ⊂Vm(). This, however, will in general not be the case. In view of Fig. D.1 weconsider the orthogonal projection of x onto R(A).1

Let P be an operator accomplishing the orthogonal projection of x ontoR(A),

y = Px = Aβ . (D.3)

We shall show that the column vectors ak; k = 1, . . . , r of the matrix Adetermine the structure of the projection operator P . Let r ∈ N(AT) denotethe component of x perpendicular to R(A) = V r

m(). We then have

r = x − y ; ATr = 0 . (D.4)

Any linear combination of the m−r linearly independent vectors spanning thenull space N(AT) = V m−r

m () of the matrix AT is orthogonal to the vectorsak; k = 1, . . . , r spanning the column space R(A) = V r

m(). However, givenP , the orthogonal projection y = Px of x onto R(A) is fixed. At the sametime, out of all vectors r satisfying ATr = 0, the vector of the residualsr = x − Px is selected.

Suppose we had an orthonormal basis

1Should x already lie in R(A), the projection would not entail any change atall.

D Orthogonal Projections 243

α1 =

⎛⎜⎜⎝

α11

α21

. . .αm1

⎞⎟⎟⎠ , . . . , αm =

⎛⎜⎜⎝

α1m

α2m

. . .αmm

⎞⎟⎟⎠

of the space Vm(). Let the first r vectors define the column space R(A) andlet the last m−r span the null space N(AT). Thus, for any vector x ∈ Vm()we have

x = γ1α1 + · · · + γrαr︸︷︷︸y

+ γr+1αr+1 + · · · + γmαm︸︷︷︸r

;

αTk x = γk ; k = 1, . . . , m , (D.5)

or

y =r∑

k=1

γkαk , r =m∑

k=r+1

γkαk .

By means of the first r vectors αk; k = 1, . . . , r we define a matrix

T =

⎛⎜⎜⎝

α11 α12 . . . α1r

α21 α22 . . . α2r

. . . . . . . . . . . .αm1 αm2 . . . αmr

⎞⎟⎟⎠ ; T TT = I . (D.6)

Here, I designates an (r× r) identity matrix. In contrast to this, the productT T T accomplishes the desired orthogonal projection, since from

(γ1 γ2 · · · γr)T = T Tx

we find

y = T (γ1 γ2 · · · γr)T = T T Tx = Px , P = T T T . (D.7)

The projection operator P has now been established. But this has been donewith reference to an orthogonal basis, which actually we do not possess. Inorder to express P by means of the vectors ak; k = 1, . . . , r of the columnspace of A, we define a non-singular (r × r) auxiliary matrix H so that

A = T H . (D.8)

Following (D.6) we have H = T TA and, furthermore, T = AH−1, i.e.

P = T T T = AH−1(H−1

)TAT = A

(HTH

)−1AT ,

and as

HTH = ATTT TA = ATPA = ATA ,

244 D Orthogonal Projections

PA = A, we finally arrive at

P = A(ATA

)−1AT . (D.9)

This operator, expressed in terms of A, accomplishes the desired orthogonalprojection of x onto R(A).

Should there be two different projection operators P 1x = y, P 2x = y,we necessarily have

(P 1 − P 2)x = 0

which is not, however, possible for arbitrary vectors x. From this we concludethat projection operators are unique.

We easily convince ourselves once again that the projection of any vectorlying already within R(A) simply reproduces just this vector. Indeed, from(D.7) we obtain

Py = T T TT (γ1 γ2 · · · γr)T = T (γ1 γ2 · · · γr)

T = y .

Projection operators possess the property P 2 = P , i.e. they are idempotent.Furthermore, they are symmetric. Vice versa, any symmetric matrix showingP 2 = P is a projection operator.

As rank (A) = r, we have rank (P ) = r. Let v be an eigenvector of Pand λ = 0 the pertaining eigenvalue, Pv = λv. Then from P (Pv) = λ2v =Pv = λv, i.e. λ2 = λ, we see λ = 1. The eigenvalues of projection operatorsare either 0 or 1.

Obviously, the trace of P is given by the sum of its eigenvalues, at thesame time the rank of an indempotent matrix equals its trace. From this weconclude that r eigenvalues are 1 and m − r eigenvalues are 0 [52–54].

E Least Squares Adjustments

We refer to the nomenclature of Appendix D.

Unconstrained Adjustment

The linear system

Aβ ≈ x (E.1)

is unsolvable as, due to the measurement errors, x ∈ Vm() and Aβ ∈V r

m(), where least squares presupposes r < m. Let us decompose Vm()into orthogonal subspaces V r

m() and V m−rm () so that

Vm () = V rm () ⊕ V m−r

m () .

The method of least squares projects the vector x by means of the projectionoperator

P = A(ATA

)−1AT (E.2)

orthogonally onto the subspace V rm(), so that Px ∈ V r

m(). The orthogonalprojection defines a so-called vector of residuals r satisfying AT r = 0, r ∈V m−r

m () and x− r = Px ∈ V rm().1 Ignoring r and substituting the vector

Px for x,

Aβ = Px , (E.3)

the inconsistent system (E.1) becomes “solvable”. After all, the quintessenceof the method of least squares is rooted in this substitution, which is, franklyspeaking, a trick. Multiplying (E.3) on the left by (ATA)−1AT yields theleast squares estimator

β =(ATA

)−1ATx (E.4)

of the unknown true solution vector.1There are, of course, infinitely many vectors r ∈ V m−r

m () satisfying ATr = 0.The orthogonal projection through P selects the vector of residuals r.

246 E Least Squares Adjustments

Constrained Adjustment

1. rank (A) = r

The least squares estimator β shall exactly satisfy q < r constraints

Hβ = y , (E.5)

where rank (H) = q is assumed. Although we adhere to the idea of orthogonalprojection, we of course have to alter the projection operator, i.e. though werequire the new P to accomplish Px ∈ V r

m(), this new P will be differentfrom the projection operator of the unconstrained adjustment.

First, we introduce an auxiliary vector β∗ leading to homogeneous con-straints – later on this vector will cancel.

Let β∗ be any solution of the linear constraints (E.5),

Hβ∗ = y . (E.6)

Then, from

A (β − β∗) ≈ x − Aβ∗ , H (β − β∗) = 0

we obtain, setting

γ = β − β∗ , z = x − Aβ∗ , (E.7)

the new systems

Aγ ≈ z , Hγ = 0 . (E.8)

Let us assume that the new projection operator P is known. Hence, Pprojects z orthogonally onto the column space of A,

Aγ = Pz , P = A(ATA

)−1AT here (E.9)

so that

γ =(ATA

)−1ATPz . (E.10)

Obviously, the condition

Hγ = H(ATA

)−1ATPz = CTPz = 0 (E.11)

is fulfilled, given the vector Pz lies in the null space of the matrix

CT = H(ATA

)−1AT .

Finally, Pz has to lie in the column space of the matrix A as well as in thenull space of the matrix CT, i.e.

E Least Squares Adjustments 247

Fig. E.1. V rm() symbolic, R(C) ⊂ R(A), R(C) ⊥ N(CT), intersection

R(A) ∩ N(CT)

Pz ∈ R (A) ∩ N(CT)

.

The q linearly independent column vectors of the matrix C span a q-dimensional subspace R(C) = V q

m() of the space R(A) = V rm(), where

R(C) ⊂ R(A). Furthermore, as

N(CT) = V m−qm () , R (C)⊥N(CT) and R (C) ⊕ N(CT) = Vm () ,

the intersection R(A) ∩ N(CT) is well defined, Fig. E.1. To project in justthis intersection, we simply put

P = A(ATA

)−1AT − C

(CTC

)−1CT

so that the component projected onto R(C), R(C) ⊂ R(A) cancels. Return-ing to β yields

β =(ATA

)−1ATx (E.12)

− (ATA)−1

HT[H(ATA

)−1HT]−1 [

H(ATA

)−1ATx − y

].

Finally, we see Hβ = y.

2. rank (A) = r′ < r

We assume the first r′ column vectors of the matrix A to be linearly inde-pendent. Then, we partition the matrix A into an (m × r′) matrix A1 andan (m × (r − r′)) matrix A2,

A = (A1 |A2 ) .

Formally, the inconsistencies of the system (E.1) “vanish”, when the erro-neous vector x is projected onto the column space of A1,

248 E Least Squares Adjustments

Aβ = Px , P = A1

(AT

1 A1

)−1AT

1 . (E.13)

However, as rank (A) = r′ < r, there are q = (r − r′) indeterminate compo-nents of the vector β. Consequently, we may add q constraints. As the columnvectors ak; k = r′ + 1, . . . , r are linear combinations of the column vectorsak; k = 1, . . . , r′ and the projection of any vector lying in the column spaceof A1 simply gets reproduced, we have PA = A or (PA)T = ATP = AT.Hence, multiplying (E.13) on the left by AT yields

ATAβ = ATx . (E.14)

Adding q = r− r′ constraints, we may solve the two systems simultaneously.To this end, we multiply (E.5) on left by HT,

HTHβ = HTy , (E.15)

so that (ATA + HTH

)β = ATx + HTy . (E.16)

Obviously, the auxiliary matrix

C =

⎛⎝ A

−−H

⎞⎠

possesses r′ linearly independent rows in A and q = r−r′ linearly independentrows in H so that rank (C) = r. But then, the product matrix

CTC =(AT A + HT H

)has the same rank. Finally, (E.16) yields

β =(ATA + HTH

)−1 (ATx + HTy

). (E.17)

The two relationships [8],

H(ATA + HTH

)−1AT = 0 and H

(ATA + HTH

)−1HT = I ,

where I denotes a (q × q) identity matrix, disclose that β satisfies Hβ = y.

F Expansion of Solution Vectors

Section 11.1: Planes

The design matrix

A =

⎛⎜⎜⎝

1 x1 y1

1 x2 y2

. . . . . . . . .1 xm ym

⎞⎟⎟⎠

leads us to the product

ATA =

⎛⎝H11 H12 H13

H21 H22 H23

H31 H32 H33

⎞⎠

in which

H11 = m , H22 =m∑

j=1

x2j , H33 =

m∑j=1

y2j

H12 = H21 =m∑

j=1

xj , H13 = H31 =m∑

j=1

yj , H23 = H32 =m∑

j=1

xj yj .

Let D denote the determinant of ATA,

D = H11

(H22H33 − H2

23

)+ H12 (H31H23 − H33H12)

+H13 (H21H32 − H13H22) .

Obviously,

(ATA

)−1=

1D

⎛⎝ H22H33 − H2

23 | H23H13 − H12H33 | H12H23 − H22H13

H23H13 − H12H33 | H11H33 − H213 | H12H13 − H11H23

H12H23 − H22H13 | H12H13 − H11H23 | H11H22 − H212

⎞⎠

is the inverse of ATA. Putting B = A(ATA

)−1, the components of β =BTz are given by

250 F Expansion of Solution Vectors

β1 =m∑

j=1

bj1zj , β2 =m∑

j=1

bj2zj , β3 =m∑

j=1

bj3zj

or

β1 =1D

m∑j=1

[H22H33 − H2

23

]+ [H23H13 − H12H33] xj

+ [H12H23 − H22H13] yj zj

β2 =1D

m∑j=1

[H23H13 − H12H33] +

[H11H33 − H2

13

]xj

+ [H12H13 − H11H23] yj zj

β3 =1D

m∑j=1

[H12H23 − H22H13] + [H12H13 − H11H23] xj

+[H11H22 − H2

12

]yj

zj .

Before writing down the partial derivatives, we differentiate the determinant,

∂D

∂xi= 2[H11H33 − H2

13

]xi + 2 [H12H13 − H11H23] yi

+2 [H13H23 − H12H33]

∂D

∂yi= 2 [H12H13 − H11H23] xi + 2

[H11H22 − H2

12

]yi

+2 [H12H23 − H13H22]

∂D

∂zi= 0 .

Now, each of the components βk; k = 1, 2, 3 has to be partially differentiatedwith respect to x, y and z,

ci1 =∂β1

∂xi= − β1

D

∂D

∂xi

+1D

⎧⎨⎩2 (H33xi − H23yi)

m∑j=1

zj + (H13H23 − H12H33) zi

+ (H13yi − H33)m∑

j=1

xj zj + (H23 + H12yi − 2H13xi)m∑

j=1

yj zj

⎫⎬⎭

F Expansion of Solution Vectors 251

ci+m,1 =∂β1

∂yi= − β1

D

∂D

∂yi

+1D

⎧⎨⎩2 (H22yi − H23xi)

m∑j=1

zj + (H12H23 − H13H22) zi

+ (H23 + H13xi − 2H12yi)m∑

j=1

xj zj + (H12xi − H22)m∑

j=1

yj zj

⎫⎬⎭

ci+2m,1 =∂β1

∂zi= bi1

ci2 =∂β2

∂xi= − β2

D

∂D

∂xi+

1D

⎧⎨⎩(H13yi − H33)

m∑j=1

zj

+(H11H33 − H2

13

)zi + (H13 − H11yi)

m∑j=1

yj zj

⎫⎬⎭

ci+m,2 =∂β2

∂yi= − β2

D

∂D

∂yi+

1D

⎧⎨⎩(H23 + H13xi − 2H12yi)

m∑j=1

zj

+ (H12H13 − H11H23) zi + 2 (H11yi − H13)m∑

j=1

xj zj

+ (H12 − H11xi)m∑

j=1

yj zj

⎫⎬⎭

ci+2m,2 =∂β2

∂zi= bi2

ci3 =∂β3

∂xi= − β3

D

∂D

∂xi+

1D

⎧⎨⎩(H23 + H12yi − 2H13xi)

m∑j=1

zj

+ (H12H13 − H11H23) zi

+ (H13 − H11yi)m∑

j=1

xj zj + 2 (H11xi − H12)m∑

j=1

yj zj

⎫⎬⎭

ci+m,3 =∂β3

∂yi= − β3

D

∂D

∂yi+

1D

⎧⎨⎩(H12xi − H22)

m∑j=1

zj

+(H11H22 − H2

12

)zi + (H12 − H11xi)

m∑j=1

xj zj

⎫⎬⎭


ci+2m,3 =∂β3

∂zi= bi3 .

Section 11.1: Parabolas

From the design matrix

A =

⎛⎜⎜⎝

1 x1 x21

1 x2 x22

. . . . . . . . .1 xm x2

m

⎞⎟⎟⎠

we find the product matrix

ATA =

⎛⎝H11 H12 H13

H21 H22 H23

H31 H32 H33

⎞⎠

whose elements are given by

H11 = m , H22 =m∑

j=1

x2j , H33 =

m∑j=1

x4j

H12 = H21 =m∑

j=1

xj , H13 = H31 =m∑

j=1

x2j , H23 = H32 =

m∑j=1

x3j .

Designating the determinant of ATA by

D = H11

(H22H33 − H2

23

)+ H12 (H31H23 − H33H12)

+H13 (H21H32 − H13H22) .

we find

(ATA

)−1=

1D

⎛⎝ H22H33 − H2

23 | H23H13 − H12H33 | H12H23 − H22H13

H23H13 − H12H33 | H11H33 − H213 | H12H13 − H11H23

H12H23 − H22H13 | H12H13 − H11H23 | H11H22 − H212

⎞⎠ .

Setting B = A(ATA)−1, the components of β = BTy are

β1 =m∑

j=1

bj1yj , β2 =m∑

j=1

bj2yj , β3 =m∑

j=1

bj3yj

or

β1 =1D

m∑j=1

[H22H33 − H2

23

]+ [H23H13 − H12H33] xj

+ [H12H23 − H22H13] x2j

yj


β2 =1D

m∑j=1

[H23H13 − H12H33] +

[H11H33 − H2

13

]xj

+ [H12H13 − H11H23] x2j

yj

β3 =1D

m∑j=1

[H12H23 − H22H13] + [H12H13 − H11H23] xj

+[H11H22 − H2

12

]x2

j

yj .

The partial derivatives of the determinant follow from

∂D

∂xi= 2 [H13H23 − H12H33]

+2[H11H33 + 2 (H12H23 − H13H22) − H2

13

]xi

+6 [H12H13 − H11H23] x2i + 4

(H11H22 − H2

12

)x3

i

∂D

∂yi= 0 ,

and for those of the components of the solution vector we have

ci1 =∂β1

∂xi= − β1

D

∂D

∂xi+

1D

⎧⎨⎩2(H33xi − 3H23x

2i + 2H22x

3i

) m∑j=1

yj

+ (H23H13 − H12H33) yi

+(−H33 + 2H23xi + 3H13x

2i − 4H12x

3i

) m∑j=1

xj yj

+2 (H12H23 − H22H13) xiyi

+[H23 − 2 (H13 + H22) xi + 3H12x

2i

] m∑j=1

x2j yj

⎫⎬⎭

ci+m,1 =∂β1

∂yi= bi1

ci2 =∂β2

∂xi= − β2

D

∂D

∂xi+

1D

⎧⎨⎩(−H33 + 2H23xi + 3H13x

2i − 4H12x

3i

) m∑j=1

yj

+4(−H13xi + H11x

3i

) m∑j=1

xj yj

+(H11H33 − H2

13

)yi + 2 (H12H13 − H11H23) xiyi

+[H13 + 2H12xi − 3H11x

2i

] m∑j=1

x2j yj

⎫⎬⎭


ci+m,2 =∂β2

∂yi= bi2

ci3 =∂β3

∂xi= − β3

D

∂D

∂xi+

1D

⎧⎨⎩[H23 − 2 (H13 + H22) xi + 3H12x

2i

] m∑j=1

yj

+(H13 + 2H12xi − 3H11x

2i

) m∑j=1

xj yj + (H12H13 − H11H23) yi

+2(H11H22 − H2

12

)xiyi + 2 (−H12 + H11xi)

m∑j=1

x2j yj

⎫⎬⎭

ci+m,3 =∂β3

∂yi= bi3 .

Section 11.4: Circles

From

A =

⎛⎜⎜⎝

1 u1 v1

1 u2 v2

. . . . . . . . .1 um vm

⎞⎟⎟⎠

we find

ATA =

⎛⎝H11 H12 H13

H21 H22 H23

H31 H32 H33

⎞⎠

whose elements are given by

H11 = m , H22 =m∑

j=1

u2j , H33 =

m∑j=1

v2j

H12 = H21 =m∑

j=1

uj , H13 = H31 =m∑

j=1

vj , H23 = H32 =m∑

j=1

uj vj .

The determinant of ATA is

D = H11

(H22H33 − H2

23

)+ H12 (H31H23 − H33H12)

+H13 (H21H32 − H13H22) .

so that

(ATA

)−1=

1D

⎛⎝ H22H33 − H2

23 | H23H13 − H12H33 | H12H23 − H22H13

H23H13 − H12H33 | H11H33 − H213 | H12H13 − H11H23

H12H23 − H22H13 | H12H13 − H11H23 | H11H22 − H212

⎞⎠ .


Putting B = A(ATA)−1 the components of β = BTw are

β1 =m∑

j=1

bj1wj , β2 =m∑

j=1

bj2wj , β3 =m∑

j=1

bj3wj

or

β1 =1D

m∑j=1

[H22H33 − H2

23

]+ [H23H13 − H12H33] uj

+ [H12H23 − H22H13] vj wj

β2 =1D

m∑j=1

[H23H13 − H12H33] +

[H11H33 − H2

13

]uj

+ [H12H13 − H11H23] vj wj

β3 =1D

m∑j=1

[H12H23 − H22H13] + [H12H13 − H11H23] uj

+[H11H22 − H2

12

]vj

wj .

For the partial derivatives of the determinant we find

∂D

∂xi=

2r∗[(

H11H33 − H213

)ui + (H13H12 − H11H23) vi

+ (H13H23 − H33H12)]

∂D

∂yi=

2r∗[(H12H13 − H11H23) ui +

(H11H22 − H2

12

)vi

+ (H12H23 − H13H22)]

and for those of the components of the solution vector

ci1 =∂β1

∂xi= − β1

D

∂D

∂xi+

1D

⎧⎨⎩ 2

r∗(H33ui − H23vi)

m∑j=1

wj

+(H22H33 − H2

23

)ui

+1r∗

(H13vi − H33)m∑

j=1

ujwj + (H23H13 − H12H33)( wi

r∗+ u2

i

)

+1r∗

(H23 + H12vi − 2H13ui)m∑

j=1

vjwj + (H12H23 − H22H13) uivi

⎫⎬⎭

ci+m,1 =∂β1

∂yi= − β1

D

∂D

∂yi+

1D

⎧⎨⎩ 2

r∗(H22vi − H23ui)

m∑j=1

wj

+(H22H33 − H2

23

)vi


+1r∗

(H13ui + H23 − 2H12vi)m∑

j=1

ujwj + (H23H13 − H12H33) uivi

+1r∗

(H12ui − H22)m∑

j=1

vjwj + (H12H23 − H22H13)( wi

r∗+ v2

i

)⎫⎬⎭

ci2 =∂β2

∂xi= − β2

D

∂D

∂xi+

1D

⎧⎨⎩ 1

r∗(H13vi − H33)

m∑j=1

wj

+ (H23H13 − H12H33) ui +(H11H33 − H2

13

)( wi

r∗+ u2

i

)

+1r∗

(H13 + H11vi)m∑

j=1

vjwj + (H12H13 − H11H23) uivi

⎫⎬⎭

ci+m,2 =∂β2

∂yi= − β2

D

∂D

∂yi+

1D

⎧⎨⎩ 1

r∗(H23 + H13ui − 2H12vi)

m∑j=1

wj

+ (H23H13 − H12H33) vi

+2r∗

(H11vi − H13)m∑

j=1

ujwj +(H11H33 − H2

13

)uivi

+1r∗

(H12 − H11ui)m∑

j=1

vjwj + (H12H13 − H11H23)( wi

r∗+ v2

i

)⎫⎬⎭

ci3 =∂β3

∂xi= − β3

D

∂D

∂xi+

1D

1r∗

(H23 + H12vi − 2H13ui)m∑

j=1

wj

+ (H12H23 − H22H13) ui +1r∗

(H13 − H11vi)m∑

j=1

ujwj

+ (H12H13 − H11H23)( wi

r∗+ u2

i

)+

2r∗

(H11ui − H12)m∑

j=1

vjwj

+(H11H22 − H2

12

)uivi

ci+m,3 =∂β3

∂yi= − β3

D

∂D

∂yi+

1D

⎧⎨⎩ 1

r∗(H12ui − H22)

m∑j=1

wj

+ (H12H23 − H22H13) vi +1r∗

(H12 − H11ui)m∑

j=1

ujwj

+ (H12H13 − H11H23) uivi +(H11H22 − H2

12

) ( wi

r∗+ v2

i

).

G Student’s Density

Suppose the random variable X to be N(µ, σ2) -distributed. Then, its distri-bution function is given by

P (X ≤ x) =1

σ√

2π

x∫−∞

exp

(− (x − µ)2

2σ2

)dx . (G.1)

We consider a sample

x1, x2, . . . , xn (G.2)

and the estimator

s2 =1

n − 1

n∑l=1

(xl − x)2 ; ν = n − 1 . (G.3)

For a fixed t we define

x = µ + ts , (G.4)

so that

P (X ≤ µ + ts) =1

σ√

2π

µ+ts∫−∞

exp

(− (x − µ)2

2σ2

)dx . (G.5)

Substituting η = (x − µ)/σ for x leads to

P (X ≤ µ + ts) = P (t, s, ν) =1√2π

ts/σ∫−∞

e−η2/2dη . (G.6)

For this statement to be statistically representative, we average by means ofthe density

pS2

(s2)

=νν/2

2ν/2Γ (ν/2)σνexp(− νs2

2σ2

)sν−2 , (G.7)

258 G Student’s Density

which we deduce from (3.36) putting χ2 = n−1σ2 s2 and pS2(s2) = pχ2

dχ2

ds2 , sothat

P (t, ν) = E P (t, s, ν) =

∞∫0

⎧⎪⎨⎪⎩ 1√

2π

ts/σ∫−∞

e−η2/ 2dη

⎫⎪⎬⎪⎭ pS2

(s2)ds2 . (G.8)

Averaging by means of the density pS(s) as given e.g. in [50] leads to thesame result as pS(s)ds = pS2(S2)dS2.

Obviously, (G.8) is a distribution function and its only variable is t. Hence,differentiating with respect to t,

dP (t, ν)dt

=

∞∫0

1

σ√

2πexp(− t2s2

2σ2

)s

pS2

(s2)ds2 , (G.9)

and integrating out the right hand side, we find Student’s (or Gosset’s) den-sity

pT (t, ν) =Γ(

ν+12

)√

πνΓ(

ν2

) 1(1 + t2

ν

) ν+12

(G.10)

the random variable of which is implied in (G.4),

T (ν) =X − µ

S. (G.11)

According to (G.7), the number of the degrees of freedom is ν = n−1. Insteadof (G.2) we now consider the estimator

x =1n

n∑l=1

xl (G.12)

and the limit of integration

x = µ + ts/√

n (G.13)

so that

P(X ≤ x

)=

1σ/

√n√

2π

x∫−∞

exp

(− (x − µ)2

2σ2/n

)dx (G.14)

leads us to

P(X ≤ µ + ts/

√n)

=1

σ/√

n√

2π

µ+ts/√

n∫−∞

exp

(− (x − µ)2

2σ2/n

)dx . (G.15)

G Student’s Density 259

On substituting η = (x − µ)/(σ/√

n) we again arrive at (G.6). Hence, therandom variable

T (ν) =X − µ

S/√

n. (G.16)

is t-distributed with ν = n − 1 degrees of freedom.

H Quantiles of Hotelling’s Density

t0.95 ; P=95%

m/n 1 2 3 4 5 6 7 8 9 10

12 133 4.3 284 3.2 7.6 445 2.8 5.1 10.7 606 2.6 4.2 6.8 13.9 767 2.5 3.7 5.4 8.5 17.0 928 2.4 3.5 4.8 6.7 10.3 20.1 1089 2.3 3.3 4.4 5.8 7.9 12.0 23.3 12410 2.3 3.2 4.1 5.2 6.7 9.1 13.7 26.4 14011 2.2 3.1 3.9 4.9 6.0 7.7 10.3 15.4 29.5 15612 2.2 3.0 3.8 4.6 5.6 6.9 8.7 11.5 17.1 32.713 2.2 3.0 3.7 4.4 5.3 6.3 7.7 9.6 12.7 18.814 2.2 2.9 3.6 4.3 5.0 5.9 7.0 8.5 10.6 13.915 2.1 2.9 3.5 4.1 4.8 5.6 6.6 7.7 9.3 11.216 2.1 2.8 3.4 4.0 4.7 5.4 6.2 7.2 8.4 10.117 2.1 2.8 3.4 4.0 4.6 5.2 5.9 6.8 7.8 9.118 2.1 2.8 3.3 3.9 4.5 5.1 5.7 6.5 7.4 8.419 2.1 2.8 3.3 3.8 4.4 4.9 5.5 6.2 7.0 7.920 2.1 2.7 3.3 3.8 4.3 4.8 5.4 6.0 6.7 7.521 2.1 2.7 3.3 3.7 4.2 4.7 5.3 5.8 6.5 7.222 2.1 2.7 3.2 3.7 4.2 4.7 5.2 5.7 6.3 6.923 2.1 2.7 3.2 3.7 4.1 4.6 5.1 5.6 6.1 6.724 2.1 2.7 3.2 3.6 4.1 4.5 5.0 5.5 6.0 6.525 2.1 2.7 3.2 3.6 4.0 4.5 4.9 5.4 5.9 6.426 2.1 2.7 3.1 3.6 4.0 4.4 4.8 5.3 5.7 6.227 2.1 2.7 3.1 3.6 4.0 4.4 4.8 5.2 5.7 6.128 2.1 2.7 3.1 3.5 3.9 4.3 4.7 5.1 5.6 6.029 2.1 2.6 3.1 3.5 3.9 4.3 4.7 5.1 5.5 5.930 2.0 2.6 3.1 3.5 3.9 4.3 4.6 5.0 5.4 5.831 2.0 2.6 3.1 3.5 3.9 4.2 4.6 5.0 5.4 5.832 2.6 3.1 3.5 3.8 4.2 4.6 4.9 5.3 5.733 3.1 3.5 3.8 4.2 4.5 4.9 5.3 5.634 3.4 3.8 4.2 4.5 4.9 5.2 5.635 3.8 4.1 4.5 4.8 5.2 5.536 4.1 4.5 4.8 5.1 5.437 4.4 4.7 5.1 5.438 4.7 5.1 5.439 5.0 5.340 5.3

References

Journals

1. Gauss, C.F., Abhandlungen zur Methode der kleinsten Quadrate, Physica,Wurzburg, 1964

2. Student (W.S. Gosset), The Probable Error of a Mean, Biometrika 1(1908)1–25

3. Fisher, R.A., Applications of Student’s Distribution, Metron 5 (1925) 90foll.4. Behrens, W.U., Ein Beitrag zur Fehlerrechnung bei wenigen Beobachtungen,

Landwirtschaftliches Jahrbuch 68 (1929) 807–9375. Hotelling, H., The Generalization of Student’s Ratio, Annals of Mathematical

Statistics 2 (1931) 360–3786. Fisher, R.A., The Comparison of Samples with Possible Unequal Variances,

Annals of Eugenics 9 (1939) 174foll.7. Welch, B.L., The Generalization of Student’s Problem when Several Different

Population Variances are Involved, Biometrika 34 (1974) 28–358. Plackett, R.L., Some Theorems in Least Squares, Biometrika 37 (1950) 149–

1579. Eisenhart, C., The Reliability of Measured Values – Part I Fundamental Con-

cepts, Photogrammetric Engineering 18 (1952) 543–56110. Wagner, S., Zur Behandlung systematischer Fehler bei der Angabe von Meßun-

sicherheiten, PTB-Mitt. 79 (1969) 343–34711. Chand, D.R. and S.S. Kapur, An Algorithm for Convex Polytopes, J. Ass.

Computing Machinery 17 (1979) 78–8612. Barnes, J.A., Characterization of Frequency Stability, IEEE IM 20 (1971)

105–12013. Bender, P.L., B.N. Taylor, E.R. Cohen, J.S. Thomsen, P. Franken and C.

Eisenhart, Should Least Squares Adjustment of the Fundamental Constantsbe Abolished? In Precision Measurement and Calibration, NBS Special Publi-cation 343, United States Department of Commerce, Washington D.C., 1971

14. Cordes, H. und M. Grabe, Messung der Neigungsstreuung mikroskopischerRauheiten in inkoharenten Lichtfeldern, Metrologia 9 (1973) 69–74

15. Quinn, T.J., The Kilogram: The Present State of our Knowledge, IEEE IM 40(1991) 81–85

16. Birge, R.T., Probable Values of the General Physical Constants, Rev. Mod.Phys. 1 (1929) 1–73

17. Cohen, E.R. and B.R. Taylor, The 1986 Adjustement of the FundamentalPhysical Constants, Codata Bulletin, No. 63 (1968)

264 References

Works by the Author

18. Masseanschluß und Fehlerausbreitung, PTB-Mitt. 87 (1977) 223–22719. Uber die Fortpflanzung zufalliger und systematischer Fehler, Seminar uber die

Angabe der Meßunsicherheit, 20. und 21. Februar 1978, PTB Braunschweig(On the Propagation of Random and Systematic Errors, Seminar on the State-ment of the Measurement Uncertainty)

20. Note on the Application of the Method of Least Squares, Metrologia 14 (1978)143–146

21. On the Assignment of Uncertainties Within the Method of Least Squares,Poster Paper, Second International Conference on Precision Measurement andFundamental Constants, Washington, D.C. 8-12 June 1981

22. Principles of “Metrological Statistics”, Metrologia 23 (1986/87) 213–21923. A New Formalism for the Combination and Propagation of Random and Sys-

tematic Errors, Measurement Science Conference, Los Angeles, 8-9 February1990

24. Uber die Interpretation empirischer Kovarianzen bei der Abschatzung vonMeßunsicherheiten, PTB-Mitt. 100 (1990) 181–186

25. Uncertainties, Confidence Ellipsoids and Security Polytopes in LSA, Phys.Letters A165 (1992) 124-132, Erratum A205 (1995) 425

26. Justierung von Waagen und Gewichtstucken auf den konventionellenWagewert, Wage-, Mess- und Abfulltechnik, Ausgabe 6, November 1977

27. On the Estimation of One- and Multidimensional Uncertainties, National Con-ference of Standards Laboratories, 25-29 July, Albuquerque, USA, Proceedings569–576

28. An Alternative Algorithm for Adjusting the Fundamental Physical Constants,Phys. Letters A213 (1996) 125–137

29. see [14]30. Gedanken zur Revision der Gauss’schen Fehlerrechnung, tm Technisches

Messen 67 (2000) 283–28831. Estimation of Measurement Uncertainties – an Alternative to the ISO Guide,

Metrologia 38 (2001) 97–10632. Neue Formalismen zum Schatzen von Meßunsicherheiten – Ein Beitrag zum

Verknupfen und Fortpflanzen von Meßfehlern, tm Technisches Messen 69(2002) 142–150

33. On Measurement Uncertainties Derived from “Metrological Statistics”, Pro-ceedings of the Fourth International Symposium on Algorithms for Approxima-tions, held at the University of Huddersfield, July 2001, edited by J. Levesley,I. Anderson and J.C. Mason, 154–161

National and International Recommendations

34. BIPM, Bureau International des Poids et Mesures, Draft Recommendation onthe Statement of Uncertainties, Metrologia 17 (1981) 69–74

35. Deutsches Institut fur Normung e.V., Grundbegriffe der Meßtechnik, DIN1319-1 (1995), DIN 1319-2 (1980), DIN 1319-3 (1996), DIN 1319-4 (1999),Beuth Verlag GmbH, Berlin 30

References 265

36. ISO, International Organization for Standardization, Guide to the Expressionof Uncertainty in Measurement, 1 Rue Varambe, Case Postale 56, CH 1221Geneva 20, Switzerland

37. Deutsches Institut fur Normung e.V., Zahlenangaben, DIN 1333 (1992), BeuthVerlag GmbH, Berlin 30

Monographs

38. Cramer, H., Mathematical Methods of Statistics, Princeton University Press,Princeton, 1954

39. Kendall, M. and A.S. Stuart, The Advanced Theory of Statistics, Vol.I-III,Charles Griffin, London, 1960

40. Graybill, F.A., An Introduction to Linear Statistical Models, McGraw-Hill,New York, 1961

41. Weise, K. and W. Woger, Meßunsicherheit und Meßdatenauswertung, Wiley-VCH, Weinheim, 1999

42. Papoulis, A., Probability, Random Variables and Stochastic Processes,McGraw-Hill Kogakusha Ltd, Tokyo, 1965

43. Beckmann, P., Elements of Applied Probability Theory, Harcourt, Brace &World Inc., New York, 1968

44. Ku, H.H. (Ed.), Precision Measurement and Calibration, NBS Special Publi-cation 300 Vol. 1, United States Department of Commerce, Washington D.C.,1969

45. Eadie, W.T. et al., Statistical Methods in Experimental Physics, North Hol-land, Amsterdam, 1971

46. Clifford, A.A., Multivariate Error Analysis, Applied Science Publisher Ltd.,London, 1973

47. Duda, R.O. and P. E. Hart, Pattern Classification and Scene Analysis, JohnWiley & Sons, New York, 1973

48. Chao, L.L., Statistics Methods and Analyses, McGraw-Hill Kogakusha Ltd.,Tokyo, 1974

49. Seber, G.A.F., Linear Regression Analysis, John Wiley & Sons, New York,1977

50. Fisz, M., Wahrscheinlichkeitsrechnung und mathematische Statistik, VEBDeutscher Verlag der Wissenschaften, Berlin, 1978

51. Draper, N.R. and H. Smith, Applied Regression Analysis, John Wiley & Sons,New York, 1981

52. Strang, G., Linear Algebra and its Applications, Harcourt Brace JovanovichCollege Publishers, New York, 1988

53. Ayres, F., Matrices, Schaum’s Outline Series, McGraw-Hill, New York, 197454. Rao, C.R., Linear Statistical Inference and Its Applications, John Wiley &

Sons, New York, 1973

Index

ISO Guideconsequences of 127

accuracy 4analysis of variance 16, 58arithmetic mean

biased 21uncertainty of 63, 77

arithmetic meanscompatibility of 81

biasand least squares 102and minimized sum of squares

residuals 104introduction of 13of arithmetic mean 22of least squares solution vector 102

central limit theorem 13circle

adjustment of 195uncertainty region of 200

comparability 52confidence ellipsoid 42, 46

in least squares 132confidence interval 27, 63, 75

in least squares 117confidence level 63, 75constants

as active parts of measuring processes80

as arguments of relationships 80conversion factor 65

uncertainty of 69, 83correlation coefficient

empirical 29theoretical 32

covarianceempirical, neglect of 31empirical 29, 35, 38empirical and Fisher’s density 43theoretical 30, 35

density of air 12deviation, systematic 14difference between two arithmetic

means 39digits, insignificant 8distribution density

Chi-square 33empirical 19F 35Fisher’s 41Hotelling’s 41, 44, 261linear function of normal variables

239multidimensional, normal 31normal 25postulated 16postulated and biases 16standardized normal 27Student’s 36, 257theoretical 19two–dimensional, normal 30

drift 11

ensemble, statistical 52, 102environmental conditions 16EP boundary 141EPC hull 141error bars 11error calculus

Gaussian 3, 16revision of 17

error combination

268 Index

arithmetical 23geometrical 17

error equationbasic 14, 71

error propagation 23, 71over several stages 87

estimator 4, 26expansion of solution vector 249

circles 254parabolas 252planes 249straight lines 165

expectation 26, 51of arithmetic mean 54, 74of empirical covariance 57of empirical variance 54of estimators 52of non-linear functions 51of the arithmetic mean 63

fairness, mutual 5Fisher–Behrens 40fundamental constants of physics

adjustment of 215localization of true values 216uncertainties of 224

Gauss–Markoff theorem 106break down of 107non-exhaustive properties 107

Gordian knot 127

Hotelling’s ellipse 44, 47Hotelling’s ellipsoid 46hypothesis testing 27

influence quantities 11input data, consistency of 109

key comparison 52, 205

least squaresabolition of 127method of 127

least squares adjustment 93, 245and weighting 108constrained 100, 246unconstrained 97, 245

localization of true values

ISO Guide 84, 120alternative error model 84in least squares 108

mass decades 210matrix, rank of 233mean of means, weighted 123measurement uncertainty 4measuring conditions, well-defined 17,

75measuring process 11, 14

stationary 4

national metrology institutes 5normal density

convergence of 27tails of 27, 63

outlier 27

pairwise comparison 215parabola

adjustment of 181uncertainty band of 183, 186, 190

parent distribution 20piecewise smooth approximation 192plane

adjustment of 175uncertainty bowl of 179

platinum-iridium 12points of attack 108potatoes, convex 141

slices of 141primary standards 203projection

orthogonal 96, 241prototype of the kilogram 210

quantiles 35Hotelling’s density 47, 261

random errors 12origin of 12

random process 18random variables

introduction of 18realization of 18sum of 28, 32, 34, 38

repeat measurements 3, 18reproducibility 4, 52, 66

Index 269

residualsvector of 94, 96

residuals, sum of squareddegrees of freedom of 127minimized 17, 95, 127

resolving power 18round robin 81, 205rounding

of estimators 9of uncertainties 9

security polygons 133security polyhedra 136security polytopes 141series expansion 71

truncation of 72SI units 64simulation 84state of statistical control 11straight line

adjustment of 151, 165for linearized function 172for transformed function 173uncertainty band of 158, 162, 169

string of parabolas, uncertainty band of194

Student’s densityconvergence of 37

time series 11traceability 17, 209, 210trigonometric polynomial

adjustment of 224uncertainty band of 229

true value 3

uncertainty 4

function of least squares estimators128

function of measurands 77of a mean of means 127of least squares estimators 108, 119of repeated measurements 63ppm 7relative 7

uncertainty spaces 130unit

defined 3realized 3

unknown systematic errors 3, 11confining interval 13estimation of 13of measurands 70of measured values 14propagation of 88, 117randomization of 13sources of 13status of 13varying of 15

variance–covariance matrix 30, 31,235

in least squares 113, 115

weighing scheme 210weightings, consequences of 119weights

arbitrary factor in 109varying by trial and error 109

working standardrealization of 203uncertainty of 205

worst-case estimation 13

measurement uncertainties in science and technology

Documents