a monte carlo method for the validation of discrimination ellipse data

A Monte Carlo Method for the Validation of Discrimination Ellipse Data C. Alder University of Bradford Bradford West Yorkshire BD7 IDP

The difficulties of fitting reliable chromaticity discrimination ellipses are discussed in relation to the distribution of data points. A simulation technique as an alternative to repeat experi-

ments is described and suggestions are made about the interpretation of simulated results.

Examples are given of the application of the technique to some well known data.

INTRODUCTION

Many large collections of colour discrimination data published in recent times have consisted of clouds of colour differences about several standards scattered in colour space [l-41. Such data can be analysed with two aims in view; either to tes t the general relative merits of various colour- difference (CD) equations, or to derive new CD equations. The latter can be done by fitting ellipsoids to the data about each standard as an intermediate step. Some workers trying to derive new CD equations have started with the aim of producing data specifically to give ellipsoids [5 ] . For the purposes of testing equations, it is useful to get an idea of the completeness of sampling about each standard. One way of doing this i s to see if the points about each standard ade- quately define an ellipsoid. Fitting ellipsoids can serve other purposes; it is a simple averaging procedure. Across the colour gamut, trends from several independent experiments can be compared and, by examining the fit of the data to the ellipsoids, inconsistencies can be detected. In an industrial setting tolerance can be defined for a standard in terms of ellipsoid parameters.

ELLIPSE-FITTING PROBLEMS Having shown that it i s a "good thing" to fit ellipsoids to colour discrimination data, what are the problems encoun- tered? The answer to that question i s "many", but this paper will only be concerned with one aspect, that of distribution of data about the standard. For simplicity's sake, from this point in the discussion, the argument will be restricted to chromaticity ellipses.

Consider Figure la , ,depicting the hypothetical results obtained using four samples one visual unit away from the standard. It i s obvious that both of the depicted ellipses fit the data perfectly. It i s not possible to tel l which ellipse, if any, really represents visual behaviour about that standard. Figure I b depicts the opposite case where the ellipse is clearly defined and, if the experiment was carried out correctly, represents real colour discrimination. Between these extremes there is a continuous gradation from ellipses that are solely experimental artifacts to those that represent the truth. An example from published data that appears to fall somewhere between the two extremes is shown in Figure 2. This i s set A from the well-known Davidson and Friede (DF) data [6 ] . The aim is to find the ellipse joining al l points of 50%A. Clearly, as there are only two points in the top le f t and bottom right quadrants, any ellipse fitted to this data would not have well-defined dimensions in these

Figure 1 - Hypothetical ellipses

quadrants. Several methods of psychophysical scaling can be applied to percentage acceptance data (%A), but these methods cannot cope when %A is 0 [71. This would limit even further the points available for ellipse fitting.

The distribution of points shown in Figure 2 i s clearly not a good one for the purpose of fitting a discrimination ellipse. Furthermore, it will only effectively t e s t a CD equation in one direction in colour space. The problem then i s to determine, in the event of a distribution of points not being as shown in Figure Ib, what it can look like, yet s t i l l be a "good" distribution, and whether a numeric definition of "goodness" can be found. Neither of these questions have firm answers, but working attempts at answers can be given. For a set number of repeat experiments, a "good" spread of sample values will give a small variation in fitted ellipse shape. This spread i s not simply a product of the number of

514 JSDC Volume 97 December 1981

c1

Av

81,

90, 26, ,29

'42

42'

si X O

0

X6

30 Ax

Figure 2 - DF Set A

directions sampled, but their angular distribution in relation to the genuine ellipse orientation. However, when performing experiments designed to elicit ellipses, for whatever purpose, the best use of facilities must be made. Time and money are not normally available for many repeat experiments for each colour standard. When analysing published data the possibility of repeat experiments does not even arise. The Monte Carlo method described here was developed a t Bradford University to overcome these difficulties.

SIMULATION TECHNIQUE

The first requirement is that an optimization technique exists for the fitting of ellipses to discrimination data. Several exist, but which is used, and how, i s irrelevant in the present context. The assumption i s then made that the experimental data, Ax, Ay, AY and AV, for each samplestandard pair represents the means that would be found from an infinite number of repeat experiments. Estimates are then made of the likely standard deviations of such measurements from what is known about the accuracy of instruments and the behaviour of human observers. Using these estimates of standard deviation simulated data are generated randomly. Table 1 shows the experimentally obtained Ax values for DF set A and a simulated set of Ax values, assuming a standard deviation of 10% of the Ax values, as an example. The Ay, AY and AV values are treated similarly. An ellipse i s first fitted to the experimental data and another i s optimized to fit the simulated experimental data. The process of simulation and optimization i s repeated several more times. All the resulting ellipses can then be compared for deviations

TABLE 1 Ax Comparison Experimental Simulated --0.0038 -0.0046

0.001 6 0.001 6 0 .ooo 1 0.0001 0.0031 0.0033 -0.0033 -0.0028 4.001 3 -0 .oo 1 1 -0.0034 4.0028 -0.0023 -0.0024

1 .01

e - experimental s - simulated

Figure 3 - Ellipse comparison

from the experimental ellipse. The thesis i s that if the original distribution of sample values was sound then there should be l i t t l e discrepancy between simulated and experimental ellipses.

This raises the next problem requiring an answer; how much is a " l i t t le discrepancy"? While it i s possible to super- impose drawings of the experimental and simulated ellipses and then to express a subjective opinion, this i s not ideal. The ideal would be some single number expression of variability, interpretable objectively. A compromise might be to consider the variability of the ellipse equation parameters. However, it i s difficult to conceptualize the meaning of such variability. It is easier to picture variability in ellipse orientation and size of semi-major and semi-minor axes. Unfortunately, even this does not get to the crux; the real point of concern is what variations in predicted colour- difference (A€) would result from applying the different ellipses?

Considering Figure 3, the discrepancy caused by applying either ellipse, in a given chromaticity direction, can be ex- pressed as the ratio of the radii of the experimental to simulated ellipses in that direction. By varying the chromaticity direction, the maximum discrepancy for the pair of ellipses can be found. If this procedure is repeated for a l l the simulated ellipses then the largest value for the maximum

N

8

e 4

42 003 -0.02 -0.01 t

Figure 4 - DF Set A

Ax

JSDC Volume 97 December 1981 515

discrepancy can be found. It i s this latter statistic that has been tried as a measure of ellipse reliability a t Bradford. The interpretation is that, for a chosen set of experimental errors and number of simulations, i f this figure is above a certain level then the experimental ellipse is likely to be an artifact of the data. If it i s below the level then it could well be that the experimental ellipse i s a reliable representation of visual behaviour about the standard. Further, this means that the data will provide a good test of a CD equation or could be used to-construct a new CD equation.

The actual value of experimental error used a t Bradford has been a standard deviation of 10% for Ax, Ay and AY. This probably underestimates the errors on small differences measured with the tristimulus colorimeters used in some of the published studies. A normal-probability model was used to scale %A results [8]. Within the range 30-70%A Coates et a/. [91 have shown that for 40 independent assessments per colour-difference pair (as in the DF case), the error in the scaled A V is approximately k0.2. Other visual scales have had errors applied as appropriate. The accuracy of visual results, as analysed by Provost [IOI, led to the decision that for 10 simulations, nine should have a maximum A€ discrepancy with the experimental ellipse of not more than 40%. These values are merely quoted for illus- trat ive purposes.

APPLICATION OF SIMULATION

To demonstrate the results of applying this technique, a few examples are given. Firstly some ellipses fitted to the data shown in Figure 2 are plotted in Figure 4. The expected uncertainty in ellipse dimensions i s clearly evident, although it must be remembered that the scaling model which ignored O%A was applied to the acceptance values. Figure 5 shows results obtained from Jaeckel data set 15. This exhibits the same problem as the DF set previousfy illustrated. Both of these sets contain only a few samples. To show that sheer weight of numbers i s not all that i s required, Figure 6 illus- trates the ellipses fitted to MMB set FC (which has 30

-

AY( X 1 01)

,0.050 -0.C 4 Figure 5 - Hatra Set 15

X I

%O& 00.50 : x x

A X

Ax( X 1 01)

samples) and five simulations. This set of data appears to have good coverage over 180" around the standard and theo- retically this should be sufficient. However, it i s in the direc-

Figure 6 - MMB Set FC

8 ?

Figure 7 - KPC Set 18

tion that the discrimination changes most rapidly, i.e. the long axis of the ellipse, that the data points are sparsest. Some particular combination of errors has exposed this uncertainty attached to the size of the major axis. Lastly, an example is given in Figure 7 of a good distribution of sample points. With careful choice, only 12 points have been sufficient to give a stable ellipse, the maximum A€ difference being 18%. The data set is part of a continuing series of experiments carried out a t Bradford and is due to Chaing [ I l l .

Application of the simulation technique described in this paper to many sets of data has shown that relying on a chance distribution of a few points, or even of large numbers of points, does not necessarily produce reliable ellipses. Stable ellipses were obtained for less than half the sets studied. Optimized ellipse size and shape is too often an artifact of the data and does not represent true visual behaviour a t that point in colour space. The Bradford experience is that with proper choice of thedistribution of simples about a colour standard approximately a dozen samples are needed to define a stable chromaticity ellipse. This requires short preliminary experiments, the necessary facilities, and patience to produce desired visual differences in the appropriate directions. A corollary is that experimental data not fulfilling the criteria of stability i s unlikely to test the ability of the CD equation to predict the correct relative sizes of differences in all directions in colour space about a standard. Many of the published collections of data do not give very stable ellipses according to the criteria given here. conse- quently, they are of limited use in deriving CD equations using ellipse prediction techniques, nor do they t e s t CD

516 JSDC Volume97 December 1981

equations anything like as completely as it might at first appear. However, the application of the simulation technique in future work could lead to a greater understanding of the importance of data distribution and an improvement in experimental method which would make data more generally useful.

Finally, it hust be realized that the problems referred to in this paper are not restricted to academic research. The same difficulties will occur for anyone in industry trying to fit tolerance ellipses. When consideration is extended to a third dimension, if a tolerance ellipsoid is required, the difficulties will increase considerably.

REFERENCES

1. Morley, Munn and Billmeyer, J.S.D.C.,91 (19751,229. 2. Jaeckel, Applied Optics, 12 (1973), 1299. 3. AATCC Metropolitan Section, Text. Chem. Colorist, 3

4. McDonald, J.S.D.C.,96 (19801,372. 5. Kuehni, J. Col. App., 1 (1972),4. 6. Davidson and Friede, J. Opt. SOC. Amer., 43 (19531,581. 7. Torgerson, "Theory and Methods of Scaling" (New

8. Coates et al., J.S.D.C., 88 (19721, 186. 9. Coates, Provost and Rigg, ibid.,88 (1972) 363.

(19711,248.

York: Wiley, 1967).

10. Provost, Ph.D. Thesis, University of Bradford, 1972. 11. Chaing, Ph.D. Thesis, University of Bradford, 1975.

DISCUSSION

Mr S M Jaeckel - (Trent Polytechnic, Nottingham): A general comment and some specific ones: with too few points, e g. four in a square in some chromaticity plane, making the square the padfail tolerance limit i s too narrow

and is wasteful, i.e. excluding potential visual passes, if one does not follow -as Hatra did -the important principle that the initial instrumental fails must be regarded as provi- sional: if they are consistently passed visually, the instrumental control limits must be expanded appropriately and re-defined, otherwise quality controllers and dyers will not accept the system.

Specifically, l e t me summarize the background 'to the selection of the Hatra dyeings to which "Hatra Set 15" belongs: dyeings and pattern selection for presentation to industrial judges to achieve an equable distribution .of accep- tabilities in quite specific directions was very timeconsuming, and was not designed to fill ellipse contours. Set 15 was of one colour, varying in one dye concentration direction. Therefore, in x y or LA5 space, the samples would and should approximate to a straight line and hence were not suitable for an ellipse, but perhaps for a limit regarding the one variation. I had suggested to Friele that ellipse-fitting to the Hatra data might not be successful, because deliberately specific dye concentration and proportion effects only had been explored. Set 15 has, of course, companion sets, with other dyeconcentration variations. If ellipse-fitting is to be attempted, the sensible thing must be to consider these sets together, not separately, when "spokes" in different directions are available.

Dr Alder: In general I agree with Mr Jaeckel. I was not trying to deni- grate the Hatra work in any way, but was merely using their Set 15 to illustrate a point. For my purposes, I did not consider Set 15 to be sufficiently near to i t s neighbours to group them together. Other Hatra sets were so treated and produced stable ellipses. However, I did not want to discard potentially useful data and attempted to fit an.elIipse to the Set 15 data with the results elaborated on in the paper.

The Development of Pass/Fail Colour

Shade Passing Roderick McDona J & P Coats Limited Anchor Mills Paisley

ae for Single Number

d

A series of modifications to the ANLAB colour-difference formula is described which has progressively improved the agreement between the formula and the padfail decisions of visual observels. The latest modification, the JPC 79 formula has been suggested as a viable alternative to visual colour matching.

The search for a reliable instrumental padfail formula has been an ongoing problem since t h e development of industrial colour control in the 1960's. Many colour-difference formulae have been developed over the years since 1930, but, notably, the work of Jaeckel and McLaren during the la te 60's enabled most of the less satisfactory formulae to be eliminated and standards of performance of available formulae to be evaluated against visual matching data. Based on that work, the Society's Colour Measurement Committee was able, in 1970, to recommend the ANLAB colour-difference formula for use in the text i le industry as

giving the best available correlation with visual padfail judgements over a wide variety of data [ I 1 .

It was fortunate for many companies in the UK that the ANLAB formula turned out on balance to be the best formula, since i t s use in conjunction with the early colori- meter, the Colormaster, had meant that by 1970 many companies had been using this formula for their colour- difference work for up to 10 years or more.

However, although the SDC investigations had shown that the ANLAB formula was one of the best available, it did not necessarily follow that it correlated well enough with the visual observer to allow straightforward application of the numerical colour differences to padfail decisions, While the formula was satisfactory for quantifying colour differences around a standard in a particular area of colour space, it was generally found that different numerical tolerance values were required for different parts of colour space - the colour space was not perfectly uniform.

At J & P Coats the non-uniformity of ANLAB colour space was vividly illustrated in 1963 when, using computer recipe prediction methods, a cubic lattice of thread samples was dyed to cover the whole volume of the ANLAB colour

JSDC Volume 97 December 1981 517

a monte carlo method for the validation of discrimination ellipse data

Documents