aspects of statistical computing, past, present and future

Statistica Neerlandica (1993) Vol. 47, nr. 1, pp. 3-8

Aspects of statistical computing, past, present and future

J. A. Nelder Department of Mathematics, Imperial College,

180 Queen's Gate, London SW7 282, United Kingdom

The various stages in the history of statistical computing are illustrated by personal experiences. The stages include batch processing, interactive working, and the consultative mode. Statistical aspects associated with one or more of these stages include the development of general algorithms, procedures for model checking, and data-driven non-parametric modelling.

Key Words & Phrases: algorithms, batch processing, conversational mode, data-driven methods, interactive working, WIMP technology.

3

1 Introduction

The editor has encouraged me to be reminiscent in this article, so I have chosen to write about some themes that have emerged during the 40 years or so that I have been involved with statistical computing. The historical part deals with machines that I did battle with; readers should substitute equivalents. Again most of the discussion of packages relates to those which I have been personally involved in, especially Genstat (PAYNE et al., 1987) and GLIMPSE (WOLSTENHOLME et al., 1988). I began with hand- operated desk calculators like the Brunsviga (on which much early statistical table- making was done), went on to electric machines like the Monroe (which overlapped the first electronic computers), and then progressed through four generations of electronic computers.

The first-generation computer was the Elliott 401, installed at Rothamsted in the early fifties; it was one of the very first commercial machines. I have described its characteristics elsewhere (NELDER, 1984); suffice to say here that it now seems un- believably primitive. Floating-point arithmetic had hardly been invented, and even integer division was missing. There was, of course, no programming language except the binary instruction set of the machine. What is remarkable is the use that Frank Yates and his team were able to make of it. For example, they wrote programs to do multi-way table arithmetic, something still not available in many statistical packages. They also tackled the problem of optimizing programs (by hand, of course) with success. I admit that I was happy to let others do the coding for the 401!

Next came the Ferranti Orion; this was in many ways a remarkable machine for its time, and it had a comprehensive instruction set with three addresses. This meant that something like

4 J. A. Nelder

A ( ( ) = B ( f ) + C ( I )

coded to just two instructions, one to set up A = B + C and one to offset the three addresses by I. The encode/decode instructions for 110 were built in, and useful instructions existed that returned things like the number of l-bits in a word or the position of the left-most l-bit. At this time the first high-level language, Mercury Auto- code appeared, and I began to program in earnest. By modern standards the language was primitive, but still represented a big step forward. While it lacked structured- programming primitives it did have internal procedures, something I often want from modern languages, but which they don’t provide. The input medium was 5-track paper tape, later augmented by a card reader of infuriating sensitivity; paper tape could be spliced and you could put in individual holes yourself. This kept programming close to the machine.

The third machine at Rothamsted was the System 4. I t was a copy of an RCA machine, which was itself a copy of the IBM 360. It had all the 360’s faults (like incorrect double-precision hardware) and disadvantages (no more sideways-add instructions, for example) plus others induced by having a different and untried operating system. As intensive users we the statisticians became one of the main operating-system de- buggers, a situation I do not recommend to anyone who wants to get some work done. A particular source of frustration was the link editor, whose use of disc transfers was so intensive that if I was in the machine room I could tell when it was running by the noise that it produced. With the System 4 came Fortran, and with that the development of Genstat (PAYNE et al., 1987).

The fourth generation started with Vax, and with it came interactive computing. The building of large programs became easier. At the same time workstations appeared, and with them the spread of a common operating system, namely Unix, which made the transfer of programs between machines much simpler. I have lived to see the advent of supercomputers, networks of machines, and laptop computers powerful enough to run the latest version of Genstat with ease. I now turn to some aspects of statistical computing and how they have developed over this period, during which time the speed and size of computers have both increased by at least seven orders of magnitude.

2

The most characteristic feature of the early years was batch processing. You handed your paper tape or batch of cards over the counter to the acolytes who tended the machine. Some time later, usually measured in hours rather than minutes, the output came back, often showing that you had made some error in the control cards or in the program, so that no useful work had been done. The process encouraged, or rather demanded, a method of working that I have described as the read-calculate-print-stop mode. In statistical terms this meant that one read in the data, usually in the form of a data matrix, then calculated, e.g., some derived variates and a regression line, printed

The influence of batch processing

Aspects of statistical computing 5

out a summary of the fit and stopped. This reflected in large measure the way that exam- ples were presented in textbooks; a model was selected a priori, it was fitted to the data, and a summary, e.g. parameter estimates and their standard errors, was given. What was missing was any notion of model checking, i.e. of seeing if the a-priori model gave an internally consistent fit to the data. Without such checks the summary could be highly misleading, depending as it did on the assumption that the model selected was at least approximately correct.

3 Interactive working and model checking

Though batch-processing programs at Rothamsted did include elementary kinds of model checking such as a display of residuals, model checking in the modern sense had to await the coming of interactive working. For then the analyst could look at the output from a first attempt, apply checks and then, if necessary, repeatthe analysis with a new model or subset of the data, all without restarting the program. Model checking in the modem sense had begun. Here we see the importance of the development of inter- preted languages for the statistical packages such as Genstat. Because they do not require compilation, they fit naturally into the interactive mode of working. I believe that statisticians were among the first user groups to appreciate the advantages and power of interactive working. Because model checking introduces a loop into what had previously been a linear sequence of operations, it profoundly changes how we do statistical analysis. For an account of model checking in its modern form as applied to the class of generalized linear models (GLMs) see MCCULLAGH and NELDER (1989).

4 The influence of algorithms

The description of a mathematical procedure by an algorithm goes back at least to Euclid, but its full exploitation has required the computer. The construction of an algorithm is an excellent test of whether you really understand a piece of mathematics or statistics. Ifthere are any gaps in your understanding they will be ruthlessly exposed. As a way of teaching yourself, explaining something to a computer is even better than explaining it to another person. Sometimes analgorithmic description is the best way of communicating an idea. I remember seeing a mathematical description of the pro- portional-scaling algorithm which involved six levels of superfixes and suffices. I found it impossible to understand. By contrast the algorithmic version reveals immediately what is being done.

When I introduced structure formulae into the description of classical experimental designs in NELDER (1965a, 1965b) I was influenced by the possibility of finding sym- bolic algorithms that would allow these formulae to be used to drive the analysis process. The specification of the structure of the classical designs by two structure formulae, one for blocks and one for treatments, has stood the test of time, and its implementation in Genstat by PAYNE and WILKINSON (1977) has provided the user with an algorithm of great generality.

6 J. A. Nelder

5 WIMP technology and the conversational mode

This technology, originating from Xerox, and exploited by Apple and others, invplves the use of menus, icons, windows, and the mouse forpointing. The effect is to introduce a conversational mode of working, whereby the user responds to a series of questions, as the result ofwhich the appropriate actions are generated. It also allows the input and output to be separated, and information to be spread over the screen in convenient ways. The user is freed from the necessity of learning the control language, though the expert may still find it more convenient to say directly what (s)he wants. For the begin- ner, however, there is no doubt that the conversational mode represents agreat advance on the older interactive mode. For the programmer the use of icons to define basic steps in an analysis makes for a good framework for large-scale programming, and the recently arrived object-oriented programming thrives in this environment. The QUESTION directive recently introduced into Genstat implements conversational working by hiding the Genstat syntax from the user and generating a Genstat program behind the scenes; it is interesting that it is itself written in the Genstat language.

6 Letting the data speak

The huge increase in computer power has made it possible to compute procedures with many levels of iteration, and this has resulted in the appearance of methods heavily data-driven and depending on various kinds of non-parametric algorithms. An obvious example is the use of splines or other kinds of smoothing applied to b , x ) data, these replacing such parametric alternatives as polynomial regression. The important characteristics of these methods is that they use local information in deciding the shape of the response at a given point. In this sense they may be said to let the data speak for themselves, rather than presuppose that the relation is, say, a quadratic polynomial.

This idea has been greatly extended by HASTIE and TIBSHIRANI (1991), to allow first the extension to several smoothers, assumed additive, and secondly to GLMs rather than just regression. The assumption of additivity is a strong one, but can be partly relaxed by allowing functionally dependent explanatory variables, e.g. by having x1 * x2 as well as xI and x2. The fitting of these models is straightforward, and combines a Gauss-Seidel type of iteration on each smooth, given the others, with the GLM iteration for each smooth.

Another computer-intensive method uses ideas of resampling to obtain the sampling distribution of statistics, which are either awkward to obtain in a parametric framework, or for which a non-parametric framework is sought, in which the sample c.d.f. replaces the theoretical one. Resampting may be with replacement (when it has become known as the bootstrap (EFRON and TIBSHIRANI, 1986), or without replacement, or use permutations.

Perhaps the most computer-intensive method of all is MARS, due to FRIEDMAN (1991). This extends the representation of generalized additive models to include terms which are smooth functions of any arbitrary set of the explanatory variables, with the algorithm itself choosing the sets to be used.

Aspects of statistical computing 7

I am not sure that we have yet enough experience of these methods to see clearly their place in statistical inference. One obvious procedure is to use them to suggest a parametric alternative to the non-parametric original. By doing this we guard against imposing models which are themselves contradicted by the data. An interesting point is whether the conclusions reached, say, by the bootstrap applied to determining a confidence interval could be closely matched by the use of parametric models in which each model fitting is followed by model-checking procedures. In some cases, at least, I have found that the conventional cycle does as well as the non-parametric method, and it could be argued that the cycle ofmodel fitting and model checking is in itselfvaluable in giving insight into the data and how closely they are matched by parametric models.

7 Coming shortly: the consultative mode

One way of giving help with syntax is to hide it under a conversational front end. Another is by giving explicit help with the construction of statements in the language. The latter has the advantage of teaching the user the language while (s)he is working. There is, however, more to statistics than giving advice on how to do things; much more important is advice on what to do, i.e. the giving of semantic rather than just syntactic help. This leads us into the field of expert systems, also known, though not very elegantly, as knowledge-based front ends.

Statistical expert systems are more difficult to construct than, say, diagnostic systems in medicine. One reason is that they deal with inference rather than diagnosis, and inference is inherently the more complex process. A second reason is that the wide range of application of statistical models means that the rules are much more abstract than those in a diagnostic system. At the same time it is necessary to find ways of putting hooks into the general rules that will facilitate the construction of systems oriented to specific applications. It is difficult to decide how general a system to aim for; the wider the scope attempted the more difficult it is to construct the rules.

During the construction of GLIMPSE (WOLSTENHOLME et al., 1988) we began to develop a hierarchy of semantic help. Primitive steps in an analysis were called tasks; a typical task might the production of a residual plot after fitting a model. Level-0 help would be with the construction of the task, i.e. syntactic in form. Level-1 help would give assistance in interpreting a task, e.g. in assessing linearity in a plot and giving suggestions about possible reasons for the deviation if the trend was non-linear. At level 2 were the activities in an analysis: we defined these as data input, data definition, data validation, data exploration, model selection, model checking and model prediction. The names are self-explanatory, except perhaps for the last, which refers to the process of defining quantities of interest which summarize the experiment, together with measures of uncertainty. We wrote level-2 help for data exploration and model selection, but failed to get beyond level-1 help with model checking, the problem being the many explanations possible for discrepancies in fit. It is possible to see the form of further development in the hierarchy of help. At level 3 we would have help with an entire analysis, i.e. with moving between the activities, while at level 4 the help

a J. A. Nelder

would be extended to the design phase of the experimental cycle as well as the analysis. The construction of high level strategies will, I predict, form a major challenge to

statisticians. We need to look towards computing environments that will facilitate the construction, testing, assessment and distribution of strategies. The production of such an environment is one of the aims of the FOCUSproject (HAGUE and REID, 1992) with which I have been involved. With such an environment in place statistics will move into a new and exciting phase. I hope I shall be here to see it happening.

References

EFRON, B. and R. TiesHiRANi (1986), Bootstrap methods for standard error, confidence intervals,

FRIEDMAN, J. H. (1991), Multivariate adaptive regression splines, Annals of Statistics 19, 1-141. HAGUE, S. and I. REID (1992), Third annual report of the FOCUS project, FOCUS Consortium,

HASTIE, T. and R. TIBSHIRANI (1991), Generalized additive models, Chapman and Hall, London. MCCULLAGH, P. and J. A. NELDER (1989), Generalized linear models (2nd ed.), Chapman and Hall,

London. NELDER, J. A. (1965a), The analysis of randomized experiments with orthogonal block structure.

I. Block structure and the null analysis ofvariance, Proceeding of theRoyalSociety A 283, 147-62. NELDER, J. A. (196Sb), The analysis of randomized experiments with orthogonal block structure.

11. Treatment structure and the general analysis of variance, Proceeding of the Royal Sociery

NELDER, J. A. (1984), Present position and potential developments: some personal views. Statistical computing. 150th Anniversary issue, Journal of the Royal Statistical Society A 147,

and other measures of statistical accurary, Sratisrtcal Science 1, 44-77.

NAG Ltd., Oxford.

A 283, 163-78.

15 1-160. PAYNE, R. W. et al. (1987), Genstat 5 reference manual, Clarendon Press, Oxford. PAYNE, R. W. and G. N. WILKINSON (1977), A general algorithm for analysis of variance, Applied

WOLSTENHOLME, D. E., C. M. O’BRIEN and J. A. NELDER (1988), GLIMPSE: a knowledge-based Statistics 26, 25 1-260.

front end for statistical analysis, Knowledge-Bused Systems I, 173-178.

aspects of statistical computing, past, present and future

Documents