statistical science 13, no 1, 1 29 consulting: real ...kbroman/publications/statsci.pdfstatistical...

29
Statistical Science 1998, Vol. 13, No. 1, 1 ] 29 Consulting: Real Problems, Real Interactions, Real Outcomes ( ) Richard Tweedie Resources Appendix by Sue Taylor Abstract. The Pullman meeting of IMS] WNAR had, as one of its themes, ‘‘Statistical consulting.’’ In this overview of the case studies presented there, an attempt is made to draw together some of the lessons of these papers, showing the diverse role of the statistician in collecting, analyz- ing and presenting the information contained in the data. Key words and phrases: Statistical consulting, client interactions, con- sulting bibliography, consulting resources 1. THE PULLMAN SESSIONS The papers in the Pullman panel discussion were presented by invitation at the IMS] WNAR meeting held at Washington State University in June 1996. I organized this session. I have, over the years, organized and participated in a number of such sessions at conferences both in the United States and in Australia, and it is clearly a perennial topic for statistical meetings, only perhaps rivalled by sessions on the gap between ‘‘academic’’ and ‘‘prac- tical’’ statisticians, or sessions on how to teach undergraduate service courses in our notoriously uncharismatic subject. This time I was asked to organize the session in order to atone for, or perhaps amplify, some com- ments I had made on the role of consultants some Ž . years previously. In Tweedie 1992 I wrote that the comments on consulting of the Committee on New Ž Researchers CNR; New Researchers Committee of . the IMS, 1991 were ‘‘glib’’: in particular, I dis- agreed rather strongly with the Committee state- ment that ‘‘ ... unless you need the data analysis w x experience your role as a consultant is to dispense Richard Tweedie is Chair, Department of Statistics, Colorado State University, Fort Collins, Colorado Ž . 80523-1877 e-mail: tweedie@stat.colostate.edu , and Director of the Center for Applied Statistical Expertise there; he was a consultant in CSIRO and Ž . the private sector in Australia 1974] 1987 . Sue Taylor was Consultant, Flinders University of South Ž . Australia 1990] 1995 ; she is now a Ph.D. student, University of Colorado Health Sciences Center, Den- ver, Colorado, and an active consultant in epidemi- ology and medical statistics. advice. The client should be responsible for the actual analysis.’’ This seemed to me to be very optimistic, or perhaps very pessimistic depending on one’s viewpoint. In almost any interaction I have ever been involved with, a statistician, especially a new researcher, is not usually seen by clients as a guru on a mountaintop. Statisticians will not go far if they adopt that role, at least not in a real consult- ing situation where it is critical to understand a variety of client ] consultant interactions that will influence the requirements for effective consulta- tion: see the Appendix for numerous views on the real complexity of the consulting role. Moreover, the CNR article advises that ‘‘ ... if you have put in substantial effort, in terms of time or developing new techniques, you should ask to be a co-author’’ on the paper that is assumed to be the end-product of consulting. I felt that this also indi- cated a serious misapprehension about what most consulting was about, even for those whose role was to be ‘‘new researchers’’ rather than full-time con- Ž sulting statisticians. Of course, one might be justi- fiably cynical about the reasons for these state- ments: given the prevailing criteria for promotion in academia in particular, the CNR may have been . more realistic than one might wish in their advice. Even so, in the belief that consulting is valuable to academic and perhaps more particularly nonaca- demic statisticians, with Deb Nolan and LuAnn Ž . Johnson the conference organizers at Pullman a double-barrelled session was organized: the first half would describe some real consulting problems that might illustrate the range of activities that consultants face, of interest in their own right as well as illustrative of the many facets of our profes- sion beyond mere advising and coauthorship; and the second part would be a panel discussion, giving 1

Upload: others

Post on 22-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

Statistical Science1998, Vol. 13, No. 1, 1]29

Consulting: Real Problems, Real Interactions,Real Outcomes

( )Richard Tweedie Resources Appendix by Sue Taylor

Abstract. The Pullman meeting of IMS]WNAR had, as one of its themes,‘‘Statistical consulting.’’ In this overview of the case studies presentedthere, an attempt is made to draw together some of the lessons of thesepapers, showing the diverse role of the statistician in collecting, analyz-ing and presenting the information contained in the data.

Key words and phrases: Statistical consulting, client interactions, con-sulting bibliography, consulting resources

1. THE PULLMAN SESSIONS

The papers in the Pullman panel discussion werepresented by invitation at the IMS]WNAR meetingheld at Washington State University in June 1996.

I organized this session. I have, over the years,organized and participated in a number of suchsessions at conferences both in the United Statesand in Australia, and it is clearly a perennial topicfor statistical meetings, only perhaps rivalled bysessions on the gap between ‘‘academic’’ and ‘‘prac-tical’’ statisticians, or sessions on how to teachundergraduate service courses in our notoriouslyuncharismatic subject.

This time I was asked to organize the session inorder to atone for, or perhaps amplify, some com-ments I had made on the role of consultants some

Ž .years previously. In Tweedie 1992 I wrote that thecomments on consulting of the Committee on New

ŽResearchers CNR; New Researchers Committee of.the IMS, 1991 were ‘‘glib’’: in particular, I dis-

agreed rather strongly with the Committee state-ment that ‘‘ . . . unless you need the data analysis

w xexperience your role as a consultant is to dispense

Richard Tweedie is Chair, Department of Statistics,Colorado State University, Fort Collins, Colorado

Ž .80523-1877 e-mail: [email protected] ,and Director of the Center for Applied StatisticalExpertise there; he was a consultant in CSIRO and

Ž .the private sector in Australia 1974]1987 . SueTaylor was Consultant, Flinders University of South

Ž .Australia 1990]1995 ; she is now a Ph.D. student,University of Colorado Health Sciences Center, Den-ver, Colorado, and an active consultant in epidemi-ology and medical statistics.

advice. The client should be responsible for theactual analysis.’’ This seemed to me to be veryoptimistic, or perhaps very pessimistic dependingon one’s viewpoint. In almost any interaction I haveever been involved with, a statistician, especially anew researcher, is not usually seen by clients as aguru on a mountaintop. Statisticians will not go farif they adopt that role, at least not in a real consult-ing situation where it is critical to understand avariety of client]consultant interactions that willinfluence the requirements for effective consulta-tion: see the Appendix for numerous views on thereal complexity of the consulting role.

Moreover, the CNR article advises that ‘‘ . . . ifyou have put in substantial effort, in terms of timeor developing new techniques, you should ask to bea co-author’’ on the paper that is assumed to be theend-product of consulting. I felt that this also indi-cated a serious misapprehension about what mostconsulting was about, even for those whose role wasto be ‘‘new researchers’’ rather than full-time con-

Žsulting statisticians. Of course, one might be justi-fiably cynical about the reasons for these state-ments: given the prevailing criteria for promotionin academia in particular, the CNR may have been

.more realistic than one might wish in their advice.Even so, in the belief that consulting is valuable

to academic and perhaps more particularly nonaca-demic statisticians, with Deb Nolan and LuAnn

Ž .Johnson the conference organizers at Pullman adouble-barrelled session was organized: the firsthalf would describe some real consulting problemsthat might illustrate the range of activities thatconsultants face, of interest in their own right aswell as illustrative of the many facets of our profes-sion beyond mere advising and coauthorship; andthe second part would be a panel discussion, giving

1

Page 2: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.2

Ž .anecdotes and advice and insights, with we hopedstrong audience participation.

This worked out surprisingly well, and the pa-pers that follow are from the first part of the ses-sion: regrettably, the insight, experience and wit ofthe second part are lost to all but those who werethere, although the bibliography and WWW re-sources appended will enable interested readers todelve further into the available literature on thistopic and share at least one of the more tangiblebits of advice from the panel.

2. THE PARTICIPANTS

In what follows you will see four articles fromfour rather different perspectives. When invitingspeakers, we looked for a spectrum of statisticalbackgrounds: the participants represent the experi-ences of graduate students, of university facultyand of statisticians working full time as consultantsin academia, in-house with a government instru-mentality and in a private sector capacity.

Karl Broman, a Ph.D. student from U.C. Berke-ley, illustrates the best of the classic ‘‘on-campus’’

Žinteractions: a good scientific collaboration even a.potential co-authorship! ; some new techniques

needed and developed; and an outcome of academicvalue to both parties. This is very much in the moldof the projects envisioned in the CNR article, buteven here the consultant is doing the analysis, notjust assuming the client will carry it out.

Jennifer Hoeting, a new researcher now at Col-orado State, describes the role of the academicstatistician as funded advisor]consultant on a proj-ect. She illustrates the way in which careful inves-tigation of the data sources is vital for any analysis

Žto be meaningful and lack of knowledge of the datacollection, or a limited client understanding of thedata under study, can cause a well-planned analy-

.sis to be inappropriate .Sue Taylor describes the sort of rare project where

the statistician is actually involved early enough tobe able to influence data collection and enhance theability to analyze. As a consultant to the AustralianLongitudinal Study of Ageing, she was able to en-

Ž .sure with much work that analysis would actuallybe carried out on reasonably clean and reliabledata, thus saving a large amount of later work inanalysis.

Missing is the paper from Bob O’Brien from Bat-telle. From the perspective of an in-house consul-tant, he described a major environmental problem,and one where much of the consultant’s role lay intrying to determine what the real goals and con-straints were, since there were very many stake-

holders in the problem: statistical analysis wouldprovide the answers if only the questions wereclear. This omission reveals again the priorities ofmany consultants: his ongoing duties preclude eventhe writing of a sole-authored paper.

Finally, wearing an oldish hat as a private sectorconsultant, I describe a situation where modellingdoes work, and an appropriate analysis yields acounterintuitive and effective solution}but yetagain, only after the data are revisited and thewhole context of the problem is understood.

3. CONCLUSIONS

Most statisticians will find something familiar inthese case studies.

We know that understanding our data and thequestions of the client are of paramount impor-tance: it is reassuring to see an example where the

Ž .statistician can control that process Taylor . Weknow that close examination can reveal far more

Ž .than the client originally told us Hoeting . Weknow that problems are rarely standard, and thatat its best statistical thinking can come up with

Ž .new ways to cope with new problems Broman , orperhaps more typically we can see the role of ourassumptions and decide how well the old ways fitŽ .Tweedie .

However, in the end, these case studies, andmany other war stories that many of us tell or haveheard, all illustrate two things overwhelmingly.

First, no matter what subject areas we enter,statistics can contribute something that was notthere previously, and we have much to offer toalmost everyone. These examples cover bench re-search problems, social surveys, management prac-tices and environmental problems on a local and aglobal scale. Without statistical thinking, none ofthem would be solvable. Moreover, they show thatit is often the mere fact of such thinking, ratherthan the specific technical input, that proves in-valuable. It is hard to overestimate how powerfullyour discipline trains us to think about complicatedissues in ways that allow us to quickly diagnosedifficulties in esoteric disciplines to which we havehad only several minutes of introduction}a factreinforced by these examples and even further bythe referees of this collection.

But second, for that contribution to be truly at itsbest, the statistician must enter into the context ofthe problem, not just as an ‘‘advisor,’’ but as some-one prepared to understand the data, analyze thedata, interact with those who really own the ques-tions being asked and consider the impact of statis-tics within the real context of the problem. The

Page 3: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 3

Pullman case studies show many of these at-tributes, but also illustrate vividly the problems wehave in achieving such an idealistic state.

RESOURCES APPENDIX

Bibliography for Consultants

One of the most useful tools for consultants is abibliography of information put together by otherconsultants. The papers below do not pretend to beexhaustive: they represent those that I have foundof most use in my professional practice.

They also contain pointers to other material cov-ering a wider range if needed. Those referencesmarked with an asterisk contain an extensive listof statistical consulting references which are notduplicated here.

Ž .ASA AD HOC COMMITTEE ON PROFESSIONAL ETHICS 1983 . Ethi-cal guidelines for statistical practice: report of the Ad HocCommittee on Professional Ethics. Amer. Statist. 37 5]20.

Ž .*BASKERVILLE, J. C. 1981 . A systematic study of the consultingliterature as an integral part of applied training in statis-tics. Amer. Statist. 35 121]123.

Ž .BOEN, J. R. 1972 . The teaching of interpersonal relationshipsin statistical consulting. Amer. Statist. 26 30]31.

Ž .BOEN, J. R. and FRYD, D. 1978 . Six-state transactional analysisin statistical consulting. Amer. Statist. 32 58]60.

Ž .BOEN, J. R. and ZAHN, D. A. 1982 . The Human Side of Statisti-cal Consulting. Lifetime Learning Publications, CA.

Ž .CHATFIELD, C. 1988 . Problem Solving: A Statistician’s Guide.Chapman and Hall, London.

Ž .CHATFIELD, C. 1991 . Avoiding statistical pitfalls. Statist. Sci. 6240]268.

Ž .ELLENBERG, J. H. 1983 . Ethical guidelines for statistical prac-tice: a historical prospective. Amer. Statist. 37 1]4.

Ž .HAND, D. J. and EVERITT, B. S. 1987 . The Statistical Consultantin Action. Cambridge Univ. Press.

Ž .HUNTER, W. G. 1981 . The practice of statistics: the real world isan idea whose time has come. Amer. Statist. 35 72]76.

Ž .HYAMS, L. 1971 . The practical psychology of biostatistical con-sultation. Biometrics 27 201]211.

Ž .*JOINER, B. L. 1961 . Consulting, statistical. In Encyclopaediaof Statistical Sciences 147]155. Wiley, New York.

Ž . Ž .JOWELL, R. Chairman 1986 . International Statistical Institutedeclaration on professional ethics. International StatisticalReview 54 227]242.

Ž .KIRK, R. E. 1991 . Statistical consulting in a university: dealingwith people and other challenges. Amer. Statist. 45 28]34.

Ž .RUSTAGI, J. S. and WOLFE, D. A. 1982 . Teaching of Statisticsand Statistical Consulting. Academic Press, New York.

Ž .SLOAN, J. A. 1992 . How to consult with a statistician. TheStatistical Consultant 9 3]4.

Ž .*WOODWARD, W. A. and SCHUCANY, W. R. 1977 . Bibliography forstatistical consulting. Biometrics 33 564]565.

Ž .ZAHN, D. A. and ISENBERG, D. J. 1980 . Non-statistical aspects ofstatistical consulting. In Proceedings of the Statistical Edu-cation Section 67]72. Amer. Statist. Assoc., Washington,DC.

Electronic Resources

A search of the World Wide Web as of October1996 revealed a large number of sites relevant toconsultants. The following are just a few of these;we have found them to be useful entry points,although they are by no means intended to coverthe growing information resource available on theWorld Wide Web:

v The World-Wide Web Virtual Library: Statistics,

http:rrwww.stat.ufl.edurvlibrstatistics.html

v Statistics on the Web,

http:rrwww.execpc.comr; helbergrstatistics.html

v Statistics Resources on the Web,

http:rrwww.stats.gla.ac.ukrctirlinks stats.html

v A Guide to Statistical Computing Resources onthe Internet,

http:rrasa.ugl.lib.umich.edurchdocsrstatisticsrstat guide home.html

v An ‘‘Essential Book List,’’ and useful if a year ortwo older than desirable,

http:rrwww.stat.wisc.edurstatisticsrconsultrstatbook.html

A very useful and up-to-date document, the ‘‘Listof Statistics Lists’’ is also available by sending theone-line message

send minitab list-of-liststo

[email protected]

or by pointing your Web browser at

http:rrwww.mailbase.ac.ukrlists-k-orminitabrfilesrlist-of-lists

This document contains details of all current statis-tics-related e-mail lists, including subscription in-formation. These enable consultants to share infor-mation or conduct discussions on a timely basis.

Page 4: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.4

Estimation of Antigen-Responsive T CellFrequencies in PBMC from Human SubjectsKarl Broman, Terry Speed and Michael Tigges

Abstract. We describe a consulting project in which the statisticiansfound that they needed to develop a new method of analysis in order toreach valid conclusions about the effect of a herpes vaccine on theimmune systems of human subjects. This method estimates the fre-quency of blood cells responding to a viral antigen at a single, carefullychosen dilution. The traditional analysis of such data uses a cutoff to

Ž .separate experimental sites ‘‘wells’’ which contain no responding cellsfrom wells which contain at least one responding cell, whereas ourmethod uses the scintillation count to estimate the actual number ofresponding cells for each well. We describe the experiment in somedetail, as we found that one of the major challenges facing the statisti-cians was the need to understand the biology in enough detail to providea relevant model.

Key words and phrases: Antigen-responsive T cell frequency, limitingdilution assay, EM algorithm.

1. INTRODUCTION

A vaccine to protect against the herpes simplexvirus type 2 has been developed at Chiron Vaccinesin Emeryville, California. In order to determine

Žimmunogenicity of this vaccine its effect on the.immune system in human subjects, an assay which

measures a subject’s cellular immune response toviral antigens was developed, and a statistical anal-ysis was required to estimate the magnitude of theimmune response.

The process leading to the method describedherein was long, arduous and very stimulating. Thestatisticians were asked to become involved in thisproject because, though an ad hoc procedure which

Ž .more or less worked in about 80% of cases wasavailable, a scientifically acceptable method ofanalysis was required.

Initially, our task was to make sense of the bio-logical context of the problem and to understandthe existing method of analysis, and we describe

Karl Broman was a graduate student and TerrySpeed is Professor, Department of Statistics, Uni-versity of California, Berkeley, California 94720.Dr. Broman’s current address is Center for MedicalGenetics, Marshfield Medical Research Foundation,Marshfield, Wisconsin 54449. Michael Tigges is Se-nior Scientist, Chiron Corporation, Emeryville, Cal-ifornia 94608.

these in the next two sections. The existing method,developed by an intelligent nonstatistician and em-bodied in a nice computer program that accepted anumber of different data types and gave results ina form desired by the user, consisted of procedureswhich were not analytically supportable, and ulti-mately we developed a rather different method thatwe describe in Section 4.

2. THE ASSAY

Frequent discussions between the statisticiansand the scientist and visits to the laboratory atChiron, where we were able to observe the entireassay procedure and inspect the various instru-ments that are used, were crucial, as we began togain an understanding of the science of the assay.The lab visits also helped us to judge the sources oferror in the assay procedure.

Essentially, the assay procedure uses the factthat the human immune response involves therecognition of an antigen by T cells whose surfacescontain receptors which are complementary to partof that specific antigen. The T cells respond by

Ž .emitting chemical signals cytokines , stimulatingother cells to replicate, or replicating themselves.

The assay under study seeks to estimate thefrequency of T cells in a blood sample which re-spond to each of two test antigens, called gD2 andgB2. If inoculating a subject with the vaccine re-sults in an increase in the frequency of respond-

Page 5: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 5

ing T cells, this may reflect on the efficacy of thevaccine.

The assay is performed as follows: in each well ofŽ .a 96-well 8 = 12 microtiter plate, diluted periph-

Ž .eral blood mononuclear cells PBMC are combinedw3 xwith growth medium, antigen and H thymidine

Žthymidine which has been radioactively labelled.with tritium . Thymidine is required to form DNA.

Replicating cells must first duplicate their DNA,and so they must take up thymidine. Providingtritiated thymidine allows us to measure the extentof cell replication: we extract the DNA from thecells and measure the amount of incorporatedw3 xH thymidine using a scintillation counter. If thecells respond to the antigen, they will take upw3 xH thymidine, and a higher scintillation count willbe obtained. For a complete description of the assay

Ž .methods, see Broman, Speed and Tigges 1996a .This type of assay is generally performed in one

Žof two ways. A standard proliferation assay e.g.,.James, 1991 compares the response from three

wells containing antigen to three wells in which noantigen was added. The ratio of the average scintil-lation count for the antigen-containing wells to theaverage count for the antigen-free wells, calledthe stimulation index, gives a rough indication ofthe immune response to the antigen. For a moreprecise estimate of immune response, a full limiting

Ž .dilution assay LDA may be performed, in which anumber of wells are used at each of several dilu-tions of PBMC.

For our purposes, the stimulation index is tooimprecise, and a full LDA requires far too many

PBMC. Thus, we study a pair of microtiter plates ata single, well-chosen dilution.

In this design, in each of two 96-well plates, wehave 24 wells which contain cells alone, 24 wellswhich contain the test antigen gD2, 24 wells whichcontain the test antigen gB2, 22 wells which con-

Žtain tetanus toxoid which serves as a positivecontrol; nearly all subjects should show a strong

.response to tetanus , plus two wells containing phy-Žtohemagglutinin PHA; a chemical which stimu-

lates all T cells to replicate, and so serves as a.second positive control . When analyzing these data,

we seek estimates of the frequencies of respondingcells in the four groups of wells: cells alone, gD2,gB2 and tetanus toxoid. The PHA wells help toselect useable data.

Data for a pair of plates are displayed in Table 1.These data are for a single dilution of PBMC from afull LDA consisting of six dilutions of cells from onesubject, collected in order to develop the assay.

3. THE TRADITIONAL METHOD OF ANALYSIS

The traditional method for analyzing such data isŽto classify each well as either positive contains at

. Žleast one responding cell or negative contains no. Žresponding cells . This is usually done see, e.g.,

.Langhorne and Fischer-Lindahl, 1981 by selectinga threshold, often the mean plus two or three stan-dard deviations of the counts for the ‘‘cells alone’’wells, and considering a well positive if its countexceeds this threshold. One then uses a statisticalmodel, typically a Poisson model, relating the fre-

TABLE 1Scintillation counts for a pair of plates from subject a713 at density 11,400 cells per well

Cells alone gD2 gB2 Tetox PHA

A 179 249 460 2133 2528 2700 2171 1663 6200 761 9864 12842B 346 1540 306 8299 1886 3245 1699 2042 3374 183 7748 10331

C 117 249 1568 1174 4293 979 1222 1536 2406 6497 2492 6188D 184 414 308 2801 2438 1776 2193 3211 1936 2492 5134 927E 797 233 461 1076 1527 2866 2205 2278 2215 3725 3706 4050F 305 348 480 3475 902 3654 2046 1285 1187 9899 5891 3646G 1090 159 89 1472 90 3639 657 2393 1814 3330 4174 2389H 280 571 329 4448 3643 881 3462 2118 1013 8793 4313 672

1 2 3 4 5 6 7 8 9 10 11 12

A 178 111 630 4699 5546 5182 3982 3104 2496 4275 2831 9727B 244 593 259 5622 560 1073 1479 2978 4362 5017 5074 10706

C 261 964 167 2991 3390 3986 2321 2157 3278 8216 3579 3538D 221 544 299 1838 4368 322 1022 1554 2980 2732 6177 5212E 533 228 615 1938 4046 333 3253 5091 2843 200 1110 5063F 818 98 160 1032 3269 4918 1778 3810 2372 6355 1869 2695G 234 472 243 4143 3351 1118 530 1174 1881 3447 4491 2945H 169 481 478 3237 1565 2211 2460 2715 4793 3029 6225 4679

1 2 3 4 5 6 7 8 9 10 11 12

Page 6: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.6

quency of positive wells to the frequency of re-sponding cells in the wells. The frequency of re-sponding cells per well is then estimated from thedata on wells using maximum likelihood.

For the data shown in Table 1, the mean q 3 SDŽ .of the 48 cells alone wells 24 on each plate is

1,401. In the cells alone group, 46 wells out of 48are below this threshold value. In the gD2, gB2 andtetanus groups, 12r48, 8r48 and 6r44 wells, re-spectively, are below this value. The maximum like-lihood estimate of the frequency of responding cellsper well for a group of wells, assuming that thenumber of responding cells in a well follows a Pois-son distribution, is obtained by taking the negativeof the natural log of the proportion of wells belowthe cutoff. Thus, the MLE’s of the frequencies ofresponding cells per well for the cells alone, gD2,gB2 and tetanus groups are 0.04, 1.39, 1.79 and1.99, respectively.

The traditional approach clearly works well muchof the time. However, for our application, determi-nation of the threshold separating positive and neg-ative wells was not at all straightforward: a com-parison of the counts corresponding to wells forwhich no responding cells were expected with thosefor wells to which antigen was added revealed noclear cutoff in many cases.

Moreover, in many cases the entire set of wellsfor a given antigen would be positive. This arosewhenever the density of cells chosen for the assaywas a poor guess, something that could not alwaysbe avoided. In such cases the standard analysis ofthe data does not yield a point estimate of thefrequency of responding cells, but only a lower con-fidence limit. This causes difficulties later whensuch results are to be compared or combined withother results.

Thus we were faced with a method that involvedseveral rather mysterious cutoff procedures and anad hoc treatment of cases in which no wells or all

Žwells were above the cutoff in which case theupper or lower binomial-based confidence limit was

.used as the ‘‘mean’’ estimate . In addition, the choiceof cutoff was not accounted for in the estimatedSE’s, though it was clearly an important source ofvariation.

Recognizing these problems, most notably thedifficulty in choosing a cutoff and the sensitivity ofthe results to that choice, the scientists sought theadvice of outside statisticians.

4. A NEW METHOD

In our first approach to the problem, we at-tempted to optimize the choice of cutoff by minimiz-ing the residual sum of squares. This required a

model using the quantitative response in a well andeventually led to the model we now describe.

Figure 1 displays a plot of the average scintilla-tion count against the cell density for each antigen

Žgroup in a six-point LDA of which the data in.Table 1 is one dilution . It can be seen that the

average scintillation count scales approximatelylinearly with the density of cells. This indicatesthat the scintillation count may contain informa-tion about the actual number of responding cells ina well, and not just about whether the well containsany responding cells.

This figure led us to make direct use of thescintillation counts in our analysis. As with thetraditional approach, we assume that the numberof responding cells in a well follows a Poisson distri-bution, but we add an extra relationship connectingthe number of responding cells in a well to thescintillation count. Specifically, we suppose thatthere are plate-specific parameters, a, b and s ,and a widely applicable power parameter p suchthat the pth power of the scintillation count in awell containing k responding cells is approximatelynormally distributed with mean a q bk and stan-dard deviation s . Since there are four groups of

Žwells on each plate cells alone, gD2, gB2 and.tetanus , there are 10 parameters in all: the four

frequencies and three additional parameters foreach plate.

Algebraically, our assumptions are as follows. Lety denote the transformed scintillation count fori jswell j of class i on plate s, and let k denote thei jscorresponding number of responding cells. Here i sc, d, b, t corresponds to the cells alone, gD2, gB2and tetanus toxoid classes, respectively. We assume

Ž .that the y , k are mutually independent, thati js i jsk follows a Poisson distribution with mean li js iand that, given k , y follows a normal distribu-i js i jstion with mean a q b k and standard deviations s i jss . The aim of our analysis is to estimate the pa-srameters l .i

The power parameter was selected by maximumŽ .likelihood Box and Cox, 1964 from the values 1,

Ž .1r2, 1r4 and 0 corresponding to log . The modelitself was fitted by the method of maximum likeli-hood, specifically, using a form of the EM algorithmŽ .Dempster, Laird and Rubin, 1977 , although wealso carried out a number of confirmatory analysesusing the fully calculated likelihood. Standard er-rors for the parameter estimates were computed

Ž .using the SEM algorithm Meng and Rubin, 1991 .A more detailed description of the statistical meth-ods can be found in Broman, Speed and TiggesŽ .1996b .

Table 2 displays the results of our analysis of thedata in Table 1. We used the square root of the

Page 7: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 7

FIG. 1. Mean scintillation counts in relation to cell density for the six-point LDA from subject a713.

scintillation counts, as indicated by a Box]Coxanalysis. Estimates were obtained by treating eachplate separately, and also for the joint analysis ofthe pair of plates, where the frequencies of respond-ing cells per well were constrained to be equal onthe two plates.

Figure 2 displays the estimated frequencies ofresponding cells per well against cell density for thesix-point LDA, of which the data in Table 1 is asingle dilution.

5. CONCLUSIONS

The aim of the single dilution assay under studywas to be a more sensitive version of the standard

proliferation assay, which obtains a stimulation in-dex as described above, and not a replacement forthe standard dilution assay.

Our method for estimating the frequency of re-sponding cells in a sample avoids many of theproblems associated with the traditional analysis,which reduces the well counts to quantal responsesusing a cutoff. Assuming that the model we use isappropriate, our analysis will also be more efficient.

By considering Table 2, and in particular thevalues of l , l , l and l , we see that the newc d b t

method does give results that confirm the originalresults from the old method and indeed strengthensthem: the two antigens give mean frequencies ofresponding cells of around 3]4, almost as high as

TABLE 2Maximum likelihood estimates and estimated standard deviations of model parameters for the data in Table 1

l l l l a b sc d b t

Joint:Ž . Ž . Ž . Ž . Ž . Ž . Ž .plate 1 0.4 0.1 3.5 0.3 3.3 0.3 4.7 0.3 16.4 0.9 10.3 0.3 3.6 0.5Ž . Ž . Ž . Ž . Ž . Ž . Ž .plate 2 0.4 0.1 3.5 0.3 3.3 0.3 4.7 0.3 14.8 0.8 9.4 0.2 2.9 0.4

Separate:Ž . Ž . Ž . Ž . Ž . Ž . Ž .plate 1 0.3 0.1 3.0 0.4 2.8 0.4 4.4 0.5 16.7 0.9 10.3 0.3 3.5 0.4Ž . Ž . Ž . Ž . Ž . Ž . Ž .plate 2 0.5 0.1 3.9 0.4 3.9 0.4 5.0 0.5 14.5 0.7 9.3 0.2 2.8 0.3

Page 8: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.8

FIG. 2. Six-point LDA for subject a713. Maximum likelihood estimates of frequencies of responding cells per well using two plates ateach dilution: estimates plotted against number of cells per well. Error bars correspond to plus or minus one SD. Dotted line correspondsto estimated frequency of responding cells per well obtained using all of the data.

for the tetanus groups and significantly above thatfor the cells alone group. Note that the old methodgives results which are noticeably lower than thosegiven by the new method.

In this project, by coming to an understanding ofthe underlying biology and developing a modelwhich described more aspects of the data, we wereable to provide the client with a valid and in factmore informative method of analysis. Its successcan be judged by the fact that recently we havebeen involved in the analysis of a number of furtherclinical trials of Chiron’s herpes vaccine. These tri-als are more complicated in that multiple subjectsreceive placebo or various doses of the vaccine andare assayed at multiple time points. Our analysismethod is used to estimate the immune responsefor each subject at each time. These estimated re-

sponses are then combined, in order to estimate theeffect of the vaccine, in increasing immune re-sponse to the viral antigens. This again involved aconsiderable amount of work, as we were not ableto use off-the-shelf methods.

The chief advantage in using our new method, inthe large clinical trials described above, is that itgives estimated frequencies of responding cells forall sets of data, whereas the old method sufferedfrom the problem that if all wells in an antigengroup were above the cutoff, one could not obtainan estimate of the frequency of responding cells,but only a lower confidence limit, making the com-bination of data from several assays very difficult.

A computer program incorporating our approachhas been developed and is now available for generaluse.

Page 9: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 9

Sandbars in the Colorado River:An Environmental Consulting ProjectJennifer A. Hoeting

Abstract. The National Park Service funded a study to determine theimpact of water released from the Glen Canyon Dam on sandbarsdownriver through Grand Canyon National Park. The project involvedconsiderable amounts of messy and missing data. Some of the chal-lenges faced and lessons learned during this project are described.

Key words and phrases: Sampling interval, environmental monitoring.

1. WHY SANDBARS?

Ž .In 1990 the National Park Service NPS fundeda project to measure sandbars in the Colorado River.The goal was to investigate the impact of waterreleased from the Glen Canyon Dam on sandbars

Ž .downriver from the dam Figure 1 .When Glen Canyon Dam began operation in 1966,

the annual flood cycle was eliminated as the damcontrolled all water flow. Floods scour the riverbottom, bringing up sediment deposited there. Whenflood waters recede, the sediment is left on theshore of the river in the form of sandbars. Surveysof the river show that sandbars have decreased insize and number since the dam opened in 1966Ž .Kearsley, Schmidt and Warren, 1994 .

Measuring sandbar sizes may sound like anothergovernment boondoggle, but sandbars play a keyrole in the ecosystem of the Colorado River. Forbirds and insects, the sandbars offer a small strip ofriparian habitat in a harsh desert environment.The sandbars also create eddies where endangeredfish and other fauna feed. Finally, rafters camp onthe sandbars during their trips down the ColoradoRiver. Not only do fewer sandbars mean reducedhabitat for fish and other wildlife, but reducednumbers of sandbars force all campers to use thesame sandbars, thereby increasing the user impacton a fragile environment. For these reasons, theNPS wanted to investigate how patterns of waterreleased from Glen Canyon Dam influence sandbarsize.

In this paper we provide some insights on the sci-entific and statistical issues related to this project.

Jennifer Hoeting is Assistant Professor of Statistics,Department of Statistics, Colorado State University,

ŽFort Collins, Colorado 80523 e-mail: [email protected], http:rrwww.stat.colostate.edur;jah .

2. THE DATA

From September 1990 to July 1991, 17 helicopterflights were made above 230 miles of the ColoradoRiver below the Glen Canyon Dam. On each flightthe same 58 out of the total population of about 600sandbars along the river were photographed. Eachphotograph was digitized to determine the size of

Ž .the sandbar Cluer, 1995b . The helicopter flightsoccurred during periods when the water was re-leased at a constant level from the dam. Betweenflights, water was released from the dam in differ-ent patterns of discharge, called test flows.

The original study design called for flights every15 days, which would result in a series of equallyspaced observations over time. However, weatherconditions and other difficulties resulted in a vari-able number of days between flights. On average,there were 20 days between flights, but flight inter-vals ranged from 12 to 70 days.

The original study design also specified that eachof the 58 sandbars was to be photographed on everyflight. Out of this sample of 58 sandbars, an aver-age of 18 and a maximum of 40 sandbars weremissed per flight. The data were missing for vari-ous reasons, primarily due to blurry photographs.

From the sandbar photographs, four numberswere recorded for each sandbar: gross area; area oferosion since the previous flight; area of depositionsince the previous flight; and net change in size,where net change is the difference between sandbarsize for the current flight and sandbar size for theprevious flight.

In addition to sandbar size measurements, sand-bar characteristics and hydrological data wererecorded. The individual sandbar characteristicsthat were recorded included location in terms ofmiles from the dam, left or right river bank andtype of sandbar. Nine hydrological measurementswere used to characterize the test flows, including

Page 10: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.10

FIG. 1. Geography of the Colorado River in the Grand Canyon. Map created by Brian Cluer, National Park Service.

means and standard deviations of daily dischargeŽover the flight period. ‘‘Upramp’’ the increase in

the level of the river at a specified point along the.river was available as mean daily maximum up-

ramp, the average of the maximum rise in riverlevel per day at five different gauging stations alongthe river. The average amount of sediment per dayentering from the Little Colorado River during each

Ž .inter-flight period sediment supply was also mea-sured, as sediment from this tributary of the Col-

Ž .orado River could impact sandbar size Figure 1 .

3. PREDICTING SANDBAR SIZES:CHALLENGES IN CONSULTING

Although these data had been previously ana-lyzed by the NPS, the NPS was interested inwhether we could extend their findings using sta-tistical models. Thus, we became involved in theproject only after all the data had been collected.

We addressed several important questions in thisproject.

3.1 Does Sediment Supply from TributariesInfluence the Sandbars?

To address this question we presented an autore-gressive model to predict net change per flightaveraged over all sandbars below the Little Col-orado River. The auto-regressive model is of theform

Y s Xb q u

Ž 2 . Žwhere u s rWu q « and « ; N 0, s Upton and.Fingleton, 1985 . In this model, Y is the net change

per flight averaged over all sandbars below theLittle Colorado River and X is the matrix of predic-tors with the elements in the first column equal to1. The parameter r can be interpreted as a mea-sure of dependence between observations of theresponse. The weights matrix W is described below.

Page 11: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 11

The auto-regressive model is sometimes called aspatial error model. The auto-regressive modeltakes into account previous observations of the re-sponse as well as previous observations of the pre-dictors to improve predictions about the responsefor the current flight. One way to interpret thismodel is that it takes time for the predictors toimpact the size of the sandbars. For example, re-sults might indicate that the mean daily dischargefrom the previous flight is related to the responsefrom today’s flight.

w xIn general, W s v where v is a nonnega-i, j i, jtive weight, which is representative of the ‘‘degreeof possible interaction’’ between observation i and jand v s 0. In this application, W was used toi iaccount for the variable number of days betweenflights. For example, for a lag 3 model

Ž .¡1r a of days between flight i and j ,~v s if 0 - i y j F 3,i , j ¢0, otherwise.

ŽWe considered several different lags the number of.previous flights in the weights matrix and typi-

cally observed significant lag coefficients, but withso few observations we were reluctant to makedefinitive conclusions with respect to the lag com-ponent.

The results from the auto-regressive model indi-cated that mean daily water discharge from thedam and presence or absence of sand added to theriver from the Little Colorado River were the mostimportant predictors of net change. The estimatedcoefficients in these models were highly variablebecause the lag component was estimated using

Žonly 15 observations 2 of the 17 flights had too few.observations to be included in these analyses .

Exploratory plots as well as results from theauto-regressive model indicated that there is a rela-tionship between sand supply and changes in sand-bar size in the Colorado River. One important find-ing was that increased sediment supply from theLittle Colorado River appears to take longer thanone flight period to impact sandbars. Future stud-ies of sandbar dynamics should collect data on sedi-ment supply from important tributaries and inves-tigate a possible lag between sediment input andchanges in sandbar size.

3.2 Can Dam Release Characteristics PredictChanges in Sandbar Size?

To answer this question we considered a stan-dard regression model to predict net change perflight averaged over all sandbars included in thestudy. The regression results indicated that, asmean daily discharge increased and upramp re-

mained fixed, the sandbars tended to increase insize on average. As mean daily maximum uprampincreased and mean daily discharge remained fixed,the sandbars tended to decrease in size on average.The other dam release characteristics were not sig-nificant predictors of change in sandbar size.

The results from both the auto-regressive modeland regression model are somewhat suspect due tothe small number of observations used to model acomplex system. We described several limitationsof these results in our final report to the NPSŽ .Hoeting, Varga and Cluer, 1997 . One concern isthat both the response and predictors were aver-ages, which makes interpretation of the modelsdifficult. For example, the mean daily dischargemay not be a good measure of the water releasepattern, because two very different water releasepatterns could have the same mean daily discharge.

3.2 How Do Dam Release Patterns ImpactIndividual Sandbars?

Another goal of the project was to produce aspacertime model to predict sandbar size for eachsandbar based on sandbar characteristics and damrelease measurements. The model was intended toprovide scientists with some guidelines on how dif-ferent patterns of water released from the damimpact different types of sandbars. Our analysesindicated that the large amount of missing dataand, more importantly, the long time intervals be-tween observations made this goal unattainable.

Recent data show that large time intervals be-tween sandbar measurements can lead to erro-neous conclusions. For example, it is common forlarge-scale rapid erosion events to occur in a matterof days or even over several hours. Figure 2 com-pares daily observations taken via automatic cam-era to results from 10 samples collected via a moretraditional terrestrial survey for one sandbar in the

ŽColorado River in 1991. Note: a different represen-.tation of these data appears in Cluer, 1995a.

Three substantial errors would be made if infer-ences were based on the 10 terrestrial survey datapoints. In case A daily observations show a gradualincrease in area from February 2 until April 16when the sandbar decreased from 130% to 70% ofits original area over a 24-hour period. The twoobservations collected via terrestrial survey onFebruary 2 and April 21 would, on the other hand,simply show a negative trend in sandbar area overthe 70-day period. In case B the observations col-lected at 30-day intervals would completely miss asubstantial erosion event and severely underesti-mate the variation in sandbar size. In case C theintermittent observations would both overestimate

Page 12: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.12

FIG. 2. 1991 Colorado River sandbar survey, sandbar 172. Comparison of 288 daily measurements collected via automatic cameraŽ . Ž .solid line to 10 intermittent measurements collected via land surveys triangle symbols and dotted line .

the time over which the erosion occurred and missout on a portion of the erosion. These data show thedanger inherent in basing inferences on sandbardata collected at sparse intervals over time. Sincedata analyzed in this paper were collected at inter-vals from 12 to 70 days, we have a very incompletepicture of what actually happened to the sandbars.

Another challenge is the large amount of missingdata. With up to 40 sandbars out of the original 58sandbars to be photographed missing for each flight,the missing data were an important concern. Whilewe considered using data interpolation methods orlikelihood-based approaches for the analysis ofmissing data, the high degree of uncertainty aboutsandbar behavior in the intervals between observa-tions made it inappropriate to use these methods.

3.4 Suggestions for Future Studies

This is the best data set ever obtained for asample of Grand Canyon sandbars; indeed, a largesample of sandbars was monitored over a long pe-riod of time as compared to previous studies ofsandbar size. Since the data were collected viaaerial photography from an airplane, it was cheaperto collect more sandbars per flight but to havefewer flights. In designing these types of studies,one must consider this tradeoff between the num-ber of sandbars included in the study and the num-

ber of observations obtained for each sandbar. Inthis study there were 58 sandbars, but, with as fewas 9 observations per sandbar collected over a longperiod of time, it was difficult to produce a crediblemodel for individual sandbars. Our results indicatethat future studies should focus on obtaining moreobservations of fewer sandbars which will allowscientists to understand the relationship betweenhydrological characteristics and changes in sandbarsize more fully.

Related to this is the issue of sampling interval.In our final report to the NPS, we argued that notonly will more frequent sampling result in betterunderstanding of the underlying natural processes,but more frequent sampling of fewer sandbars cansave money. Traditional sampling techniques useeither aerial photography or land-based surveying.Flying at low altitude deep in the Grand Canyon isexpensive, dangerous and may be ecologically un-sound. Land surveying is similarly expensive andtime-consuming, so it is best to budget for fewflights or few surveys where many sandbars aremeasured. A better design would be to set up auto-matic cameras at a few sandbars to take photo-graphs at specified intervals. Our analyses showedthat, despite the reduction in the number of ob-served sandbars, more information about the ques-tions of interest would be gained through our sug-

Page 13: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 13

gested design. A formal cost model would be a goodway to present these tradeoffs.

4. SOME LESSONS LEARNED

This project reinforces several basic rules for sta-tistical consultants.

First, always check the data for errors at thestart of the project. These data had been analyzedpreviously and thus we assumed that the databasewas error free. In fact, there were some seriousproblems still remaining. One of the most impor-tant was errors in the computation of net change,the response of main interest to our clients. We alsodiscovered other errors, for example, in computa-tion of the number of days between flights. Ourexperience on this project emphasizes the need forsimple checks of data accuracy before beginningany analyses.

This project also demonstrates why statisticalconsultants should make every effort to obtain theraw data, if available. The original goal of the studywas to relate the change in sandbar size to charac-teristics of the test flows for each flight, but onlythe summary statistics of the test flows were madeavailable to us. While statistics such as mean andstandard deviation of daily discharge over the flightperiod numerically characterize the test flows, theraw measurements would have provided us withfurther insight into the nature of each test flow.

We were also unable to obtain the raw values fordaily maximum upramp from each of the five gaug-ing stations along the river. Since we received onlysummary statistics averaged over the five stationsand averaged over the flight period, it was impossi-ble to relate upramp to the distance of each stationfrom the dam, which is important because uprampincreases with distance from the dam. Withoutdoubt, increased access to raw data would haveimproved our ability to draw useful scientific in-ferences.

Finally, as consultants we must guard ourselvesagainst standing on the ‘‘statistician’s pedestal’’from which we lecture scientists on the limitationsof their studies. It is easy for a statistician tocriticize a study after the data have been collected.We should recognize that, just as statisticians makecompromises while doing analyses, investigators areunder considerable constraints when designingtheir studies, including financial, time manage-ment and political constraints. Even with the bestintentions in study design, we recognize that col-lecting high quality, complete data outside of a

controlled laboratory environment can be an ex-tremely difficult endeavor.

5. CONCLUSIONS

The NPS gained a considerable amount of usefulinformation from our efforts. We provided insightinto the relationship between net change in sand-bar size and mean daily discharge, upramp, pres-ence or absence of sand added to the river from theLittle Colorado River and improved methods forcollecting and evaluating data on sandbar sizes.Previous statistical analyses of sandbar size datahave been limited, being based on simple models

Žthat ignore spatial and time correlations Beus and.Avery, 1992; Cluer, 1995b . Thus our study was a

step in the right direction toward the collection ofadditional data and the development of useful mod-els to predict sandbar sizes. Despite data limita-tions that prevented the use of highly sophisticatedspace]time models, we were able to identify theneed for such models and the type of data andanalyses that would be most useful for futurestudies.

Ž .The U.S. Bureau of Reclamation USBOR , whichoperates the dam, faces the continual challenge ofbalancing the needs of the ecosystem with the needsof the power companies. In the past, discharge ofwater through Glen Canyon Dam has been con-trolled to optimize peak load hydropower produc-tion. In the spring of 1996, USBOR released a largecontrolled flood intended to reinvigorate the sand-bars on the Colorado River. Preliminary observa-tions show that this goal was at least partially

Ž .achieved Wegner, 1996 .Our analyses, along with analyses of data from

the controlled flood, will help scientists further un-derstand the relationship between sandbar size anddam water releases. The need to understand theimpact of dams on ecosystems is continually in-creasing: it is predicted that by the year 2000 over

Ž60% of the world’s rivers will be regulated Gore.and Petts, 1989 . Statisticians can play a key role

in this research by helping scientists design goodstudies and by continuing to develop methodologyto assist in the analyses of these and similar data.

ACKNOWLEDGMENTS

This project was supported by the NPS. The au-thor thanks Brian Cluer of the NPS, for initiatingthis project and for providing scientific insightsthroughout the project, and Kristina Varga, whoassisted on the project while a graduate student atColorado State University.

Page 14: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.14

Setting up Computer-Assisted PersonalInterviewing in the AustralianLongitudinal Study of AgeingSue Taylor

Abstract. We discuss the role of a statistical consultant in a large-scalelongitudinal study of an ageing population in South Australia. Particu-lar emphasis is given to the way in which this collaboration enhancedthe accumulation of relevant data by planning the collection and analy-sis months before the survey began. Computer-assisted personal inter-

Ž .viewing CAPI was used, providing a method of obtaining survey datawhich avoided expensive data entry and editing. Additional benefitswere apparent in the facilitation of data management and analysis, andCAPI was also well received by both interviewers and respondents. Theimportance of enlisting the services of a statistician in the early stagesof a large study is clear in this project.

Key words and phrases: CAPI, panel surveys, longitudinal data ageingstudies, survey efficiencies.

1. INTRODUCTION

In this note, we describe one contribution of astatistician consulting on a major longitudinalstudy, the Australian Longitudinal Study of AgeingŽ .ALSA . We first briefly introduce the study, thenindicate the reasons for using computer-assisted

Ž .personal interviewing CAPI . This decision givesthe statistician much more scope than usual to beuseful. We then describe the implementation andadvantages of the CAPI approach and give somedescriptions of ALSA outcomes indicating the waysin which CAPI was successful.

ALSA is a multidimensional panel survey of thesocial, health, behavioral, economic and environ-mental characteristics of a random sample of peo-ple aged 70 years and over, who live in Adelaide,South Australia.

The ALSA project is the most comprehensive pop-ulation-based study of ageing yet undertaken inAustralia and is supported in part by the U.S.National Institute on Aging. It is a cross-nationalcollaboration jointly undertaken by the Centre for

Sue Taylor is with the Department of PreventiveMedicine and Biometrics, University of ColoradoHealth Sciences Center, Denver, Colorado 80262Ž .e-mail: [email protected] .

Ageing Studies, Flinders University of South Aus-tralia and the Center for Demographic Studies,Duke University, North Carolina.

The general aim of ALSA is to gain increasedunderstanding of how social, biomedical, behav-ioral, economic and environmental factors are asso-ciated with age-related changes in the health andwell-being of older persons. One of the more specificresearch aims is to determine the health levels andfunctional status of a representative older popula-tion and to track these characteristics over time.

The ALSA project consists of extensive face-to-face interviews of participants covering a wide rangeof topics and includes over 700 individual items.The study was conducted in four phases, or ‘‘waves,’’at approximately one-year intervals.

Table 1 shows a selection of the interview do-mains. As can be seen, there was a wide variety ofinputs to the study questionnaire from many di-verse areas of research interest and consequentlyfrom many different researchers. In the early stagesof questionnaire design, we attempted to restrictthe questions to those which would answer thespecific goals of the study. This was particularlyimportant in the context of the elderly as the lengthof time spent with the interviewer needed to bekept as short as possible.

Consulting a statistician is invaluable to the pro-cess of questionnaire design, as typically statisti-cians have been exposed to many data that have

Page 15: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 15

TABLE 1Interview domains

Demography SleepSelf-rated health Morbid conditionsMedication use Health servicesFalls and injuries Vision and hearingDental WeightReproductive history Finances and incomeSignificant life events Smoking and alcoholOccupation and education Familyrsocial contacts

been rendered useless due to poor planning, andare therefore sensitized to the issues involved be-fore they occur. This turned out to be particularlythe case in the ALSA study, where the statisticiancould also play an important role in the choice ofsurvey instrument.

The decision to use CAPI presented a muchgreater opportunity than anticipated to integrateproposed analyses with the collection procedure.

2. DESCRIPTION AND CHOICE OF CAPI

CAPI was introduced in the 1980s as part of therevolution in the measurement of public opinionusing computer technology. Not surprisingly, itspopularity increased soon after the price, weightand quality of portable computers began to improvesignificantly.

In CAPI, the interviewer takes the computer,preloaded with the questionnaire, to the respon-dents’ homes. There, the interviewer reads thequestions as they appear on the screen and thentypes the respondents’ answers immediately intothe computer.

The interview conducted in this way is little dif-ferent from the traditional paper-and-pencil inter-view, as far as the respondent is concerned. How-ever, there are differences between the traditionalmethods and CAPI with varying levels of relevanceto overall study quality. Empirical evidence does

Žnot support the suggestion Couper and Groves,.1989 that the mere presence of the computer may

affect the outcome, at least with respect to refusalsor partial nonresponse, even for sensitive issues.Major documented differences are that CAPI inter-views take longer, and that the notes made by

Žinterviewers on the computer are shorter Couper.and Groves, 1989 .

In setting up ALSA, a pilot study was conductedusing conventional paper-and-pencil survey meth-ods, but an early decision was made to use CAPI inthe main survey. In the United States, most na-tional surveys are conducted with the use of CAPIŽ .Saris, 1991 , but this is the first time such amethod has been used in a survey of the elderly in

Australia. The decision was based partly on theŽ .works of Groves and Mathiowetz 1984 and Har-

Ž .low 1985 , who both found that, in telephone sur-veys, computer-assisted data collection using com-

Ž .puter-assisted telephone interviewing CATI wasless expensive and yielded better data more quicklythan traditional techniques. There was also somepreliminary evidence that CAPI was likely to show

Ževen greater improvements in quality Birkett,.1988 .

The anticipated advantages of the CAPI methodwere similar to those that have been demonstrated

Ž .previously for the CATI technique Nicholls, 1988 .These include the following:

1. the integration of several survey steps into asingle activity that includes editing, coding, dataentry, checking and cleaning;

2. immediate detection and resolution of errorsmade by interviewers and respondents;

3. reduced costs;4. the capability to design more complex question-

naire instruments and skip patterns.

Expensive data entry and editing can be avoided,so the overall cost when compared with conven-tional paper questionnaire use is lower. Additionalbenefits are those associated with the facilitation ofdata management and analysis through an inter-face provision with statistical software, such asSPSS. Development of a data file, complete withvariable information needed for immediate analy-sis, could typically take up to a week without thisfacility. In an ideal situation, this task is under-taken by the statistician who will analyze the data.Using CAPI, this task becomes part of the question-naire construction, and generation of the data fileoccurs immediately after the last piece of data isentered.

In the context of a longitudinal study, this tech-nique proves invaluable in that data from previouswaves can be recalled and utilized in the subse-quent interview. As an example, it is a useful wayof avoiding redundant questions such as ‘‘How manynatural teeth do you have left?’’ when it has beenestablished previously that the respondent hasnone! Also it allows tailored questioning such as‘‘Two years ago you told us that you had shingles.Do you still suffer from this condition?’’

3. THE ALSA SAMPLE

The main study sample was randomly selectedŽ .from within the Adelaide Statistical Division ASD ,

which is essentially the greater metropolitan areaof Adelaide. The sample, stratified by age and sexin five-year age groups to 85 years and older, was

Page 16: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.16

TABLE 2Population characteristics of Australia and South Australia

Australia South Australia

Total population 17.7 1.5Ž .millions of people

% living in urban areas 85.3 73.0Median age 32.7 33.9% aged 65 or older 11.7 13.4

obtained from the State Electoral Database. In Aus-tralia, compulsory voting ensures that this is avirtually complete listing of all adults 18 years andover. This gave the potential group of primary re-

Ž .spondents. Spouses of this group aged 65 and overas well as other household members aged 70 yearsor older were also invited to participate. Table 2gives some of the relevant characteristics of theAustralian and South Australian populations.

The sample selection was carried out by the Aus-tralian Bureau of Statistics, using charts describingthe structure of households where elderly peoplelive. Since males in these age groups have a highermortality rate than females over a five-year period,they were deliberately oversampled to allow suffi-cient numbers for longitudinal tracking. One of theunique features of the sampling plan was the provi-sion for interviewing elderly couples and 565 cou-ples were recruited to the study at baseline, in-creasing the efficiency of sampling resources.

The breakdown of the baseline sample is given inTable 3.

The study consisted of four waves, with Wave 1being the baseline survey. Waves 1 and 3 usedCAPI to interview eligible persons in their normalplace of residence, and Waves 2 and 4 were short

Ž .telephone interviews approximately 15 minutes . Aseparate proxy instrument was developed and usedvery successfully for those respondents too ill orfrail to respond personally.

Table 4 shows the response rates for Waves 2 andŽ .3 results of Wave 4 are not available at this time .

The baseline data were collected from a total ofŽ .2,087 respondents see Table 3 ; 1,477 of these were

TABLE 3Participants in the baseline sample by age and sex

Age Males % Females %

65]69 17 1.6 123 11.970]74 279 26.4 283 27.475]79 283 26.8 241 23.380]84 235 22.3 194 18.885q 242 22.9 190 18.4

Total 1,056 1,031

TABLE 4Response rates for Waves 2 and 3

Wave 2 Wave 3

n % n %

Deceased 112 5.4 241 11.5Moved from ASD 13 0.6 33 1.6Unable to contact 13 0.6 9 0.4Refusal 170 8.1 125 6.0Responders 1,779 85.2 1,679 80.5Total 2,087 2,087

primary respondents, 597 were spouses or sec-ondary respondents and 13 were other householdmembers who agreed to take part.

Excellent retention of the cohort was achieved inWaves 2 and 3 with more than 90% of the surviv-ing respondents remaining in the study. This waslargely due to the dedication of the team involvedin the study who worked hard to involve the partic-ipants and maintain their interest.

4. IMPLEMENTING CAPI IN ALSA

To support the CAPI approach in this survey, wechose an integrated package for survey manage-ment known as BLAISE, which was developed by

Žthe Netherlands Central Bureau of Statistics Beth-.lehem et al., 1989 . The programming language is

essentially a modified version of the PASCAL lan-guage.

The BLAISE system makes provision for CAPIand includes questionnaire design and adminis-tration, checking, data editing, tabulation andanalysis.

The generation of a BLAISE program proceeds ina number of defined steps:

1. The questionnaire is specified using a text edi-tor. The routing specification is then applied toprovide for skips and subquestionnaires. Rangechecks and consistency checks are then includedand can be differentiated as ‘‘hard’’ or ‘‘soft’’errors. Hard errors, specified in terms of rela-tional expressions, must be satisfied before theresponse is accepted as valid, and require correc-tion before an interview can proceed. On theother hand, soft errors result in a warning mes-sage that can be overridden by the interviewer.

2. Raw BLAISE statements are turned into an exe-cutable program which is copied to the inter-viewer’s laptop computer ready for use.

3. The data are collected into files which may thenbe converted into standard ASCII files or otherformats, including SPSS system files with ac-companying syntax files to describe the data.

Page 17: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 17

The use of CAPI made it theoretically viable tostart data analysis as soon as the last piece of datawas collected. Essentially no data cleaning was nec-essary and all relevant data transformation pro-grams could be written beforehand and executedimmediately. Of course, this all relies on the factthat the questionnaire items are correctly specifiedin the initial stages and require little manipulationafter collation. One of the critical benefits of earlyinvolvement of a statistician is in ensuring thatthis specification is indeed accurate.

There were a number of specific tools which werefound to be of great value. These included:

1. electronic notepad to allow notes written by theinterviewer during the course of the interview tobe stored on a separate but related file;

2. interrupt facility to allow the storage of incom-plete interviews and subsequent return to theincomplete record}this was particularly impor-tant in the context of the elderly and especially

Žsince despite our best efforts to shorten inter-.view times the average length of interview wasŽ .132 minutes ! ;

3. a ‘‘ditto’’ facility which copied the response fromthe previous questionnaire for the correspondingquestion; this was very useful as we were inter-viewing elderly couples in the same householdand responses to certain questions were oftenidentical;

4. a built-in clock facility, which allowed timing ofthe interview length for later analysis;

5. provision for the generation of a paper version ofthe questionnaire.

Data were returned on a weekly basis on floppydisks, although it is also possible to download datato a central computer with the use of modems. Aprogram was written to backup these data so thatat least two copies of all data were available at alltimes.

5. WHY THE STATISTICIAN WAS VALUABLE

Automatic coding of categorical variables is doneby the CAPI program, rather than by the interview-ers or respondents. In order for this facility to befully utilized, it proved invaluable to have thestatistician devise the coding scheme in the plan-ning stages to allow for later statistical analysis.This was especially important since techniques suchas regression were intended, and the statisticalpackages used required the data in a specific, non-intuitive format.

In this study, the longitudinal nature of the datamade it absolutely necessary for an experienced

statistician to be involved who had worked previ-ously with data of this type. Specification of thecorrect data structure was an integral part of solv-ing the data storage problem alone, as there weremany megabytes of data generated by the designwe used. Apart from this potential difficulty, it wasalso imperative to consider the inevitable missingdata problems and how best to flag these withrespect to future analyses. The statistician is ableto design possible imputation procedures, if appro-priate, or at the very least ensure that the occur-rence of missing data does not render that entirerecord useless.

The inclusion of a statistician on the researchteam also proved invaluable during the ‘‘covariatebrainstorming’’ sessions. Not only do statisticiansrealize the importance of collecting data on all po-tential covariates, but they also know the type ofdata to collect. We managed to persuade the collab-orators to elicit covariate data as continuous mea-surements wherever possible, with subsequent cat-egorization, rather than risk the loss of informationby collapsing variables at the collection stage. Un-like paper questionnaires, this would result in atotal loss of potentially useful data, since the com-puter may only store one realization of a variable,with the other lost forever.

Overall, statistical thinking at the outset helpedshape the whole procedure, and it was rewardingfor both the statistician and the researchers tobegin the collaboration at a point where the impactwas greatest.

6. CONCLUSIONS

In ALSA, it was notable that CAPI was wellreceived by interviewers and respondents alike, andthe high response rates in Table 4 reflect this.Some interviewers, initially hesitant about the useof computers in the interview process, became posi-tively excited about their use and many of the olderpeople in the survey showed a genuine interest inthe technology.

In this study, the time spent in the initial stageswas definitely worthwhile, especially that spent ondefining range and consistency checks. Rangechecks are of course only the first step in the datacleaning process, as they simply allow a broad checkon the data value for one particular variable. Al-though they were not completely infallible in themultivariate sense, and we still saw a few incon-gruous combinations of data values, we found thatin addition to the consistency checks, a large per-centage of the errors which typically occur in dataof these types were eliminated in the ALSA imple-mentation.

Page 18: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.18

One of the most important features of CAPI isthe validation facility, a task which an interviewertypically does not have the time to carry out duringthe interview. Data can be checked against otherinformation to assess its quality, a critical featureparticularly in a panel survey. If there are discrep-ancies, the respondent can be asked for immediateclarification, so that the data are cleaned while therespondent is still available. This is a clear advan-tage of CAPI since traditional methods may forcethe incorrect answer to a missing value. In general,if the questionnaire is well constructed at the out-set, data can be considered optimal and later vali-dation omitted.

In our experience, CAPI proved to be an efficient,accurate, cost-effective and acceptable method forcollecting data from older people in a community

survey. Data analysis was also made less of a choreby the absence of the many hours which wouldnormally be spent on tedious and time-consumingdata cleaning.

The added features of the system described in theprevious section enhanced acceptability markedly.The future in this area will certainly see enhance-ments in the software, and hardware improvementsincluding acceptance of voice and handwritten en-try, which will make this mode of data collection aneven more attractive option.

Data collection in such a large study can bedaunting. However, with a modest investment oftime in the planning stages, jointly between thestatistical consultant and the rest of the researchteam, the rewards can be great.

Queueing at the Tax OfficeRichard Tweedie and Nell Hall

Abstract. This paper discusses a consulting project where, by focussingon the basic parameters of a probabilistic model, advice was given thatcould result in real improvement in the service at an Australian taxoffice, without raising the costs of the operation. The results are notintuitive and illustrate that nonlinear behavior in models can be hardfor nonmathematicians to follow or even believe.

Key words and phrases: Waiting times, server numbers, MrMrc queues,delays, loss of customers.

1. THE PROBLEM

Applied probability problems are often of a scalethat does not lend itself to ‘‘consulting.’’ There areof course many outstanding examples of the use ofapplied probability techniques in major collabora-tive efforts, in, for example, teletraffic and network-ing, epidemiology, spatial pattern recognition andthe like, and these often result in the type of collab-oration that is commended in the advice given by

Žthe New Researchers Committee of the IMS 1991;.hereafter CNR , but they rarely look like the sort of

consulting that most clients arrive with, as noted in

Richard Tweedie is with the Department of Statis-tics, Colorado State University, Fort Collins, Col-

Ž .orado 80523 e-mail: [email protected] .Nell Hall is with the New South Wales Departmentof Health, North Sydney NSW 2060, Australia.

Ž .Tweedie 1986 . That is, they are usually harder ordeeper, and do not have the type of constraints,benefits and rewards that one might expect if com-ing from a nonacademic environment.

This paper describes an anomaly in this pattern:an applied probability problem that really is pureconsulting, with no new methodological research,but with the rewards and problems of difficult data,of client interactions, of approximations to realityand of time and funding constraints, and with ahappier than often ending, since valuable advicecould actually be provided and implemented withinthe client’s budget.

The problem is simple to describe and will strikea chord with all who have been put on hold inautomated telephone enquiry lines everywhere.

In the mid-1980s, both of us were working in amedium-sized private sector consultancy, SIRO-MATH Pty Ltd, in Australia. We were consulted by

Page 19: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 19

TABLE 1Effectiveness of the calling system

Measure of effectiveness Site A Site B Site C

Ž .Average time on hold sec w 140 200 753ˆ$Percentage p waiting ) 3 min 30% 45% 92%3

an officer of the Australian Tax Office who wasconcerned that the behaviors of the ‘‘dial-in’’ en-quiry lines at different offices were inexplicablydifferent. The tax office had recently installed man-agement software to track characteristics of theirenquiry system and had looked at several measuresof effectiveness, including the following:

1. the average length of time w waiting on holdˆbefore questions were addressed;

2. the percentage of customers p who waitedˆ3longer than three minutes before being an-swered.

In particular, the client was concerned that oneoffice seemed to have remarkably poor behaviorcompared with others. Table 1 captures most of the

Žcritical material: we have for reasons of confiden-.tiality labelled three of the offices we studied as

Sites A, B and C, with Site C clearly being the onein distress. One can only admire the patience ofthose calling to clarify their tax return status andqueries: note that at Site C nearly every callerwaited more than three minutes and the averagecall was on hold for over 12 minutes!

Although there were some other, well posed,questions about the possibility of networking thesystem, like many consultancies this one had aregrettably inexplicit main question: it was of theorder of ‘‘what is going on here and can we doanything about it?’’ We will see that there are someoptions, at least, for using standard queueing the-ory to address this satisfactorily, but that such an

Ž .application requires as do so many consultanciesparticular care in acquiring appropriate data beforecarrying out the analysis.

2. THE QUEUEING MODEL

In this project we had three skills to offer: thefirst was indeed the ability to advise on the types ofmodels that might fit the situation, as discussed inCNR, but in contrast to CNR, the second was tohelp the client identify real data that might enablethe model to be assessed and the third was to carryout the analysis for him since this was rather out-side his capabilities.

As in all statistics applications, the modellingshould come first, or we do not know what data will

be relevant. In this case we turned to the simplestqueueing model: a multiserver queue with c serversŽ .the staff in the enquiry room of the tax office , the

Žcustomers arriving in a Poisson process a rela-tively standard assumption, and one which implieswe needed to know only the average rate l of calls

.per minute and with the time to serve customerstaken as i.i.d. random variables with exponentialservice times of mean length 1rm.

This last was, as we will describe, a rather roughassumption in this case: but with it we are able touse simple known results to assess the type of

weffectiveness measures being proposed. For a goodif rather old-fashioned description of how this might

Ž .work, see Lee 1968 , who still has sound advice onhow to live in a real world with the distributional

xassumptions involved.These exponential assumptions are not always

appropriate. One of the truly beautiful results ofqueueing theory is, however, the ‘‘critical threshold’’result that says that the system will, regardless ofsuch distributional assumptions, have stability orinstability properties according as the traffic inten-sity

r s lrcm

is less than 1 or otherwise. In the stable situationthe queue will not get too lengthy, and in theunstable situation it will grow beyond bounds. FromTable 1, Sites A and B look stable and Site C israther like an unstable situation, and so we firstsought to see what the values of r in our systemmight be. For these we only need the mean interar-rival and service times l, m and the number ofservers c. The data we were given are in Table 2: asdescribed in the next section, the system really isclose to or above critical, especially when c issmaller than it is reputed to be, and this does helpexplain some of the longer waiting times observed.

We can then use the exponential distributionalassumptions to enable us to consider the effective-

TABLE 2Input data

Parameter Site A Site B Site C

Ž .Input rate per minute l 4.18 2.27 1.60Ž .Mean call length sec 180 187 200

Ž .Wrap-up time % 10% 18% 40%Mean service time including 198 221 280

Ž .wrap-up 1rmMinimum number of servers c 10 8 6minAverage number of servers c 13 12 8aMaximum number of servers c 15 14 9max

Page 20: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.20

ness parameters and predict their values, at leastin the stable situation. We find, in particular, that

Ž .the analytic forms are given by Lee 1968 ,cŽ .cr

3Ž lycm .p s p e ,3 0 c!cŽ .cr p0

w s ,2c! Ž .cm 1 y r

where the probability of an empty queue is

y1r ccy1 Ž . Ž .cr cr 1p s q .Ý0 Ž .r ! c! 1 y rrs0

In this case the key question was not to predictŽ .these or other similar quantities with great accu-

racy, but rather to decide why the input parametercombinations might be leading to the particularcombinations that were being observed. Note againthat only l, m and c are relevant to these results,and so we did not need more information than thison the system.

3. DATA COLLECTION

Our initial set of data was provided from thethen-new computerized telephone system. Table 2shows that the average rate of calls at the biggestof the sites was around 4]5 per minute: this ap-peared relatively stable over the day, with the ex-ception of the first hour when, not surprisingly, therate was usually closer to the maximum of 5 perminute. The rates at the other sites seemed accept-ably constant over the whole of the day.

The system also provided average call lengths.These were relatively constant across all sites. Re-grettably the system did not collect the actual dis-tribution of calls: in principle this might have beenpossible but the resources to reconfigure the systemfor better data were not available from the client.Thus we were not able to verify if an exponentialdistribution was reasonable.

The most difficult information to collect was thenumber of servers. Internal staffing sheets showedthe number of servers on an hourly basis, and thesevaried widely within the day. In particular the

Žmaximum number which was possibly the number.the client felt to be available was actually 150% of

the number often really working. Given the roleof c in r or in the waiting times, this is of veryconsiderable concern.

Note that the number of servers actually as-signed to each site appears in principle to be in linewith the observed input rate: indeed, if anythingSites B and C seem to have pro-rata more serversthan they should have in comparison to Site A, if

we judge by the input rate. This helps illustrate theclient’s understandable concerns about the poor be-havior at Site C, since in principle the model saysthat this should be well under control.

Following the initial data collection, in this proj-ect we had the very real benefit of supplementingthe paper data with one site visit, to Site A. Therewe learned rather more, as one so often does, andin particular we discovered two extra pieces of in-formation:

1. On every call there was a ‘‘wrap-up’’ period afterthe call, when the server made notes, shiftedfiles and so on. Once we learned of this, we foundthat data existed to estimate the extent of wrap-up activities in each office, and although theseprobably had at least some minimum length, wemodelled them as a percentage of the servicetime and added them to the observed servicetime; this is used in Table 2 to give the m weused, and then in Table 3 to give the predictedvalues of w and p . Note in particular that at3Site C this wrap-up time adds much more to theactual call length than at the other sites.

2. There was also a period of ‘‘idle time’’ for eachserver, some of which was time absent from theroom and which might of course be reflected inthe server counts, but some of which was at thedesk and in principle should be added to theservice time; in Table 4 we take account of this.

These extra service times would not have beenpicked up without the detailed information col-lected on site. Some clients are insistent that statis-ticians visit the scene of their operations, and thiscan be time consuming for the statistician: others ofcourse prefer to keep the statistician as far fromthe facts as possible! But whatever the attitude ofthe client, in order to give sensible advice withincontext, it is a step that one should always try totake, as this case study illustrates.

TABLE 3Predicted behavior excluding idle time

Parameter Site A Site B Site C

Minimum traffic rate 1.38 1.04 1.24r s lrc mmin min

Average traffic rate 1.06 0.69 0.93r s lrc ma a

Maximum traffic rate 0.92 0.59 0.82r s lrc mmax max

Assumed c for prediction 15 9 8Ž . Ž . Ž . Ž .Predicted observed mean 110 140 262 200 423 753

wait WŽ . Ž . Ž . Ž .Predicted observed p 22 30 45 45 57 923

Page 21: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 21

TABLE 4Predicted behavior including idle time

Parameter Site A Site B Site C

Ž .Idle time % 8% 8% 14%Mean service time including 213 238 319

wrap-up and idletime 1rm*

Minimum traffic rate 1.48 1.12 1.41r s lrc m*min min

Average traffic rate 1.14 0.75 1.06r s lrc m*a a

Maximum traffic rate 0.99 0.64 0.94r s lrc m*max max

Assumed c for prediction 15 10 9Ž . Ž . Ž . Ž .Predicted observed mean 1259 140 162 200 537 753

wait wŽ . Ž . Ž . Ž .Predicted observed p 90 30 31 45 63 923

4. RECOMMENDATIONS

Tables 3 and 4 show that the observed behaviorof Sites A and B is reasonably consistent with themodel predictions, especially if we assume Site A

Ž .is using all servers effectively, and if in contrastSite B is using rather close to its minimum num-ber of servers. Site A is also very close to criticalŽ .r s 1 even when the full complement of serversis present.

If we do not incorporate the wrap-up time in SiteC then in fact that site is far from critical: evenwith the minimum observed of 6 servers, they stillhave r s 0.89 and a predicted mean waiting time of

just 180 seconds. However, the poor behavior canbe far better explained if we take into account the40% increase to service time following the additionof the wrap-up time. Indeed, their behavior is evenmore consistent with the model where we also addin some percentage of the idle time as well.

In no cases were the fits of the data perfect forthe model, of course. In particular, if we assume thevalue 5.4 minutes for the mean service time for SiteC, then we get a value of around 750 seconds for

Ž .the mean waiting time consistent with reality , butwe find that we only have p s 75%; conversely we3get close to the observed value of p s 92% by3assuming a mean service time of just 5.57 minutes,

Žbut the expected waiting time is at a noticeably.theoretical! 62 hours or so. This might perhaps be

explained by a distribution of service times with a‘‘lump’’ of probability near the origin, correspondingperhaps to part of the wrap-up times being of fairlyfixed length, but we were not in a position to lookfurther at this. Nonetheless, the general operationseemed well described by this simple model, and itwas possible to give some rational advice basedon it.

In Figure 1 we illustrate the prime recommenda-tions we gave to the client:

Ž1. At Site C the average predicted waiting time w.on Fig. 1 could be reduced to around 1.5 min-

Ž .utes from the current 12.5 minutes by addingŽjust one extra server to have an effective group

FIG. 1. Changes in waiting time as number of servers varies at site C: W is predicted waiting time, W * is waiting time with reducedwrap-up time.

Page 22: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.22

.of 9 , and indeed w could be virtually halvedmerely by ensuring that all eight current serverswere constantly available.

2. even more usefully, training should be institutedat Site C to reduce the percentage of time spenton wrap-up to at most 15% of the length of thecall, as was achievable at all other sites; theresulting waiting time, given as w* on Figure 1,is less than three minutes even with only sevenservers. It is well under a minute if all eight areworking. Considerable further reductions areachieved if wrap-up is only 10%, as at Site A.

One of the effects that the client found hardest tobelieve was that, as just illustrated, the whole sys-tem was so sensitive to very minor improvementsin the parameters when close to critical. It is notintuitive that just one extra server, or, more dra-matically, just saving some seconds in mean servicetimes, could have such a powerful effect.

Various other recommendations were made, es-pecially as the client was seriously investigatingthe possibility of routing calls between the sites, sothe effective server pool would suddenly becomearound 35]50. We were able to predict that such anaction would reduce the average waiting time towell under a minute and ensure no more than10]15% of customers would be waiting for a 3-minute period: this would give far better servicethan at any other single site we discussed with theclient.

Did this consultancy improve the service to thetaxpayers? Sadly, I have no idea. And this is thelast of the lessons in this article for the new consul-tant: do not always expect to make a great differ-ence and be grateful if you get any level of recogni-

Žtion. This project led to no paper except, a decade.later, this one , even though it involved much time,

so there was no reward in an academic sense; itpotentially helped many people at almost no cost tothe client, since it clearly identified simple manage-ment changes that would give the desired result;but as so often is the case, the client did not feel thestatistical consultant was relevant to implementingthese, and we heard no more of it.

So why bother with such consulting? For manyreasons: first, and not to be overlooked, we were inthis instance being paid to be professionals, and,like lawyers and doctors and other professionals,we should provide our skills and not necessarilyexpect to be further involved and not on centerstage; second, statistics is designed to solve realproblems, and here we did just that, and this can beits own reward; and third, the project did achieveone of the values noted in CNR, namely, it was

Ž .fascinating and gave at least to us a real knowl-edge of yet another area in which statisticians canplay a part that cannot be played by anyone else.

REFERENCES

BARLOW, R. E., BARTHOLOMEW, D. J., BREMNER, J. M. and BRUNK,Ž .H. D. 1972 . Statistical Inference under Order Restrictions;

the Theory and Application of Isotonic Regression. Wiley,New York.

BETHLEHEM, J. G., HUNDEPOOL, A J., SCHUERHOFF, M. H. andŽ .VERMEULEN, L. F. M. 1989 . BLAISE 2.0ran introduction.

CBS report, Netherlands Central Bureau of Statistics, Voor-burg, The Netherlands.

Ž .BEUS, S. S. and AVERY, C. C. 1992 . Final report: the influenceof variable discharge on Colorado River sand bars belowGlen Canyon Dam. Technical report, National Park Service.

Ž .BIRKETT, N. J. 1988 . Computer-aided personal interviewing: anew technique for data collection in epidemiologic surveys.American Journal of Epidemiology 127 684]690.

Ž .BOX, G. E. P. and COX, D. R. 1964 . The analysis of transforma-tions. J. Roy. Statist. Soc. Ser. B 26 211]252.

Ž .BROMAN, K. W., SPEED, T. P. and TIGGES, M. 1996a . Estimationof antigen-responsive T cell frequencies in PBMC from hu-man subjects. J. Immunol. Methods 198 119]132.

Ž .BROMAN, K. W., SPEED, T. P. and TIGGES, M. 1996b . Estimationof antigen-responsive T cell frequencies in PBMC from hu-man subjects. Technical Report 454, Dept. Statistics, Univ.California, Berkeley.

Ž .CLUER, B. 1995a . Cyclic fluvial processes and bias in environ-mental monitoring, Colorado River in Grand Canyon. J.Geology 103 411]421.

Ž .CLUER, B. 1995b . Final report: aerial photography of theGCES-II test flows and interim flows. Technical report,National Park Service.

Ž .COUPER, M. and GROVES, R. 1989 . Interviewer expectationsregarding CAPI: results of laboratory tests II. Bureau ofLabor Statistics, Washington, DC.

Ž .DEMPSTER, A., LAIRD, N. and RUBIN, D. 1977 . Maximum likeli-hood estimation from incomplete data via the EM algorithm.J. Roy. Statist. Soc. Ser. B 39 1]38.

Ž .GORE, J. A. and PETTS, G. E. 1989 . Alternatives in RegulatedRiver Management. CRC Press, Boca Raton, FL.

Ž .GROVES, R. M. and MATHIOWETZ, N. A. 1984 . Computer assistedtelephone interviewing: effects on interviewers and respon-dents. Public Opinion Quarterly 356]369.

Ž .HARLOW, B. L. 1985 . A comparison of computer-assisted andhard copy telephone interviewing. American Journal of Epi-demiology 122 335]340.

Ž .HOETING, J. A., VARGA, K. and CLUER, B. 1997 . PredictingColorado River sandbar size using Glen Canyon Dam re-lease characteristics. Final report, National Park Service.

Ž .JAMES, S. P. 1991 . Measurement of basic immunologic charac-teristics of human mononuclear cells. In Current Protocols

Žin Immunology J. E. Coligan, A. M. Kruisbeek, D. H. Mar-.gulies, E. M. Shevech and W. Stroger, eds. Section 7.10.

Green Publishing and Wiley]Interscience, New York.Ž .KEARSLEY, L. H., SCHMIDT, J. C. and WARREN, K. D. 1994 .

Effects of Glen Canyon Dam on Colorado River sand de-posits used as campsites in Grand Canyon National Park,USA. Regulated Rivers: Research and Management 9137]149.

Page 23: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 23

Ž .LANGHORNE, J. and FISCHER-LINDAHL, K. 1981 . Limiting dilu-tion analysis of precursors of cytotoxic T lymphocytes. In

Ž .Immunological Methods I. Lefkovits and B. Pernis, eds. 2Section 12. Academic Press, New York.

Ž .LEE, A. M. 1968 . Applied Queueing Theory. MacMillan, Lon-don.

Ž .MENG, X.-L. and RUBIN, D. B. 1991 . Using EM to obtainasymptotic variance]covariance matrices: the SEM algo-rithm. J. Amer. Statist. Assoc. 86 899]909.

Ž . Ž .NEW RESEARCHERS COMMITTEE OF THE IMS CNR 1991 . Meet-ing the needs of new statistical researchers. Statist. Sci. 6163]174.

Ž .NICHOLLS, W. L., II. 1988 . Computer-assisted telephone inter-viewing: a general introduction. In Telephone Survey Meth-

Ž .odology R. M. Groves, ed. 377]387. Wiley, New York.

Ž .SARIS, W. E. 1991 . Computer-assisted interviewing. Sage Uni-versity Paper Series on Quantitative Applications in theSocial Sciences, 07-080. Sage: Newbury Park, CA.

Ž .TWEEDIE, R. L. 1986 . In and out of applied probability inŽAustralia. In The Craft of Probabilistic Modelling J. M.

.Gani, ed. 291]308. Springer, New York.Ž .TWEEDIE, R. L. 1992 . New researchers report: comments.

Statist. Sci. 7 263]264.Ž .UPTON, G. and FINGLETON, B. 1985 . Spatial Data Analysis by

Example 1. Point Pattern and Quantitative Data. Wiley,New York.

Ž . ŽWEGNER, D. 1996 . Personal communication U.S. Bureau of.Reclamation .

CommentMichael W. Trosset

Richard Tweedie and his collaborators are to becongratulated for providing an interesting and en-lightening forum on an exciting part of our profes-

Žsion}one that too many academic statisticians are.encouraged to neglect. I will organize my contribu-

tion to this forum by elaborating on two of Tweedie’sconclusions, with both of which I enthusiasticallyconcur.

1. ‘‘It is often the mere fact of such thinking, ratherthan the specific technical input, that provesinvaluable. It is hard to overestimate how power-fully our discipline trains us to think about com-plicated issues in ways that allow us to quicklydiagnose difficulties in esoteric disciplines towhich we have had only several minutes of intro-duction.’’

Statisticians who have not done much consultingmay take for granted what I regard as the mostimportant of the services that we offer. Of coursewe are sometimes asked to develop original meth-ods for novel situations and of course we are oftenasked to ensure that standard methods are usedappropriately in standard situations. Yet when Ibegan consulting, I was struck by how often I pro-vided a service without doing anything that anacademic researcher would recognize as statistics.

Ž .Time and again I was thanked and paid for askingquestions and suggesting perspectives that seemedto me to be little more than common sense. Thishighly developed common sense is an easily over-looked, but extraordinarily valuable commodity.

Michael W. Trosset is Visiting Associate Professor,Department of Mathematics, University of Arizona,Tucson, Arizona 85721.

Several weeks into the consulting seminar that Itaught in the fall of 1995, one frustrated studentobserved that none of our clients seemed to knowexactly what they wanted. This, I believe, is therule rather than the exception, and in such circum-stances the fundamental contribution of the statis-tician is to help the client formulate appropriatequestions. Statisticians know what kinds of ques-tions can be answered and they excel at abstractingthe essential features of an investigation without

Ž .becoming distracted by the often fascinating de-tails of the particular application. In my experience,despite their limited knowledge of the application,they often discover confounding factors and suggestalternative causal explanations that had not oc-

Ž .curred to the investigator s }not because statisti-cians are more clever than scientists, but becausestatisticians are trained to look for such things.

Perhaps it is not surprising that statisticianstake for granted the general character of statisticalreasoning and tend to emphasize the technical pro-cedures that they study and employ. Unfortunately,one consequence of this emphasis is the correspond-ing perception by clients that this is all that statis-ticians have to offer. Many}if not most}of myconsultations begin with the client asking technicalquestions about the procedures that he or she hasbeen using or contemplating. I invariably respondby asking the client to tell me a little about theapplication. Because the answer is usually too eso-teric for me to understand, I follow up by askingthe client to explain the project to me as though he

Žor she was explaining it to his or her parents. Aformer colleague with considerable consulting expe-

.rience substitutes ‘‘grandparents’’ for ‘‘parents.’’Not only have I found this to be a necessary preludeto more sophisticated discourse, it is really quite

Page 24: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.24

remarkable how much progress can be made at thisvery unsophisticated level.

The fact that so much of what statistics has tooffer resides in its way of thinking rather than inits technical input has important implications forhow statistics should be taught, especially in ser-vice courses. Perhaps inevitably, most statisticscourses are organized by procedure. Procedures areillustrated by sanitized examples that carefullyavoid the complications and ambiguities that com-promise their use in the real world. This practicemakes it easier for students to learn the proce-dures, but it is apt to mislead them into identifyingstatistics as a collection of mathematical recipes. Infact, it is to resolve the messy issues that arecarefully hidden in the typical service course thatthe judgment of a statistician is most needed.

I was a self-employed statistical consultant from1989 through 1992. When I returned to academia in1993, I found that my consulting experiences hadaffected my pedagogical priorities. For example, Iregularly teach an introductory statistics course forgraduate students from other departments who willrequire statistical guidance in their dissertationresearch. Because no one becomes statistically self-sufficient after one semester of study, I try to pre-pare students to become intelligent consumers ofthe assistance that they will inevitably seek. Ser-vice courses train future clients, not future statis-ticians.

2. ‘‘The statistician must enter into the context ofthe problem, not just as an ‘‘advisor,’’ but assomeone prepared to understand the data, ana-lyze the data, interact with those who really ownthe questions being asked and consider the im-pact of statistics within the real context of theproblem.’’

ŽI have always advised my students and anyone.else who inquired that, in selecting a statistician

with whom to work, one should seek an individualwho wants to learn about the application and avoidindividuals who merely want to be handed a dataset and to return an answer. In virtually every

Ž .application, there is a gap often a vast gulf be-tween the details of the application and standardstatistical theory. For various reasons, it seems tome that this gap is more easily bridged by thestatistician than by the client.

First, the mathematical theory of statistics islikely to seem far more esoteric to the client than isthe client’s discipline to the statistician. Second, thestatistician will usually have a good sense of whathe or she needs to understand about the client’sdiscipline and can ask pointed questions towardthat end, whereas the client will often not know

what statistical issues are relevant to the problem.Third, and most important, gaps between applica-tion and theory require that compromises be made.Nature does not compromise}if natural phenom-ena are to be studied, then it is incumbent on thestatistician either to devise relevant theory or tomake informed judgments about the propriety ofusing standard procedures in situations not ad-dressed by extant theory.

There are other reasons for statisticians to be-come aggressively involved in their consultations.Jennifer Hoeting emphasized that ‘‘statistical con-sultants should make every effort to obtain the rawdata, if available.’’ I definitely agree, but I submitthat they should also make every effort to observeŽ .some of the data collection. Not only can this be

Ženormously interesting and entertaining as thestatistical consultant to a U.S. Bureau of Reclama-tion study of the effect of fluctuating flows fromGlen Canyon Dam on riparian bird nesting, I spent

.18 days rafting the Colorado River! , but it is oftenessential for proper analysis and interpretation ofthe data.

For example, I was the statistical consultant toseveral longitudinal studies of the effects of Alz-heimer’s and Parkinson’s diseases on memory andlanguage. In one study, we administered a comput-erized serial reaction time task. The task comprised6 sequences of 8 blocks of 10 items. It was wellknown that, within each sequence, the block meanreaction time decreased as subjects acquired greaterproficiency in responding. Our data exhibited thispattern for the first five blocks and for the lastthree blocks, but there was a discontinuity betweenthem. Indeed, the mean reaction time for the sixthblock was dramatically greater than for the fifthblock. The scientists were baffled, so I did my owndetective work and asked one of the staff to admin-ister the test to me. It turned out that the computerprogram attempted to store a sequence of responsesin active memory, but we were using a computerwith insufficient memory to store the entire se-quence. In the midst of the sixth block, there was a‘‘hiccough’’ as the computer wrote the contents ofactive memory to disk. The hiccough did not lastlong}the interviewers had not noticed it}but itwas more than enough to break a subject’s rhythmand invalidate the experiment.

In conclusion, these articles evoked fond recollec-tions of my own consulting experiences and causedme to reflect upon their role in my professionaldevelopment. Had they been available, I would haveasked the students in my consulting seminar toread them. I hope that they will encourage otherstatisticians to broaden the scope of their profes-sional activities.

Page 25: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 25

CommentKaren Kafadar and Max D. Morris

These articles provide further evidence of the rolethat statistics can play in science, as has beennoted previously by eminent statisticians. We re-view some of these earlier references to consultingin the literature and emphasize that applicationsand the advancement of theory, both scientific andstatistical, go hand in hand, and thus that consult-ing should be actively encouraged in graduate pro-grams and valued by university faculties.

‘‘Statistics depends for its raison d’etre and con-ˆtinuing vitality on continued contact with substan-tive disciplines that actually generate data and

Žmake inferences in the face of uncertainty’’ Moore,.1990, p. 268 . This collection of papers provides

evidence for Moore’s assertion; in fact, some ofthem even confirm his later statement that ‘‘thedirection of statistical research is affected by realproblems, and the resulting new methods are usedby practitioners.’’ It is a pleasure to read a series ofarticles such as these, particularly when they leadto new approaches to analyzing data and new in-sights about the processes that generated them.The authors of these articles join other statisticianswho have previously outlined important principlesof statistical consulting and demonstrate the impor-tant role that statistics can play in advancing boththe science of the application and the research inour own field. Curiously, despite statistics’s depen-dence on such consultancies for its survival andcontinued growth, consulting problems as the basisfor motivating the methodology are not fashionablein some circles, perhaps because consulting andapplied statistics in general is sometimes regardedas a required function rather than as a vehicle foradvancing research. This series of articles remindsus that useful research comes from useful problemsand that, even in those instances where no newmethodology was developed, the statistician none-theless contributed insight into the problem.

Early exposure to consulting problems can fosteran appreciation for their role in useful research.

Karen Kafadar is Professor, Department of Mathe-matics, University of Colorado, Denver, Colorado80217-3364. Max D. Morris is Senior Research StaffMember with the Mathematical Sciences Section,Oak Ridge National Laboratory, Oak Ridge, Ten-nessee 37831-6367.

Many graduate programs in applied statistics en-courage participation in consulting. One exampleknown to us is the Stanford biostatistics workshop,a weekly two-hour seminar attended by both medi-cal scientists and statisticians. Generally, the flooris given in the first hour to the client, who describesthe problem and then relinquishes the floor in thesecond hour to the consulting statistician. The in-teraction is very stimulating, and often the prob-lems have led to advancements in statistical prac-tice, such as those later published by Mosteller and

Ž . Ž .Parunak 1985 , Efron and Feldman 1991 andŽ .Bacchetti 1990 . Besides generating enthusiasm,

respect for consulting and opportunities for appliedresearch, these types of interactions can provideimportant motivation and guidance for basic re-

Žsearch in statistics but, as Tweedie mentions, thedemanding pace of jobs}be they in government,industry, or academe}sometimes prevents one

.from following through .The case studies here add to the many illustra-

tions of the valuable and unique role that statis-tical thinking can provide and how much a statis-tician can contribute despite relatively cursoryknowledge of the problem. As Tweedie says, ‘‘It ishard to overestimate how powerfully our disciplinetrains us to think about complicated issues in waysthat allow us to quickly diagnose difficulties inesoteric disciplines to which we have had only sev-eral minutes of introduction.’’ Examples of thestatistician’s potential contributions to the problem

Ž .include the many essays in Tanur et al. 1989 , aswell as the missed opportunity in connection withthe Challenger Space Shuttle described by Hoadley

Ž .and Kettenring 1990 . Many of our collective earlyconsulting experiences had little to do with statis-tical analysis per se, but involved broader consid-erations which nonetheless required statisticalreasoning and guidance. We, like other young in-vestigators, learned at this stage that a statisticalconsultation is not the conversion of a ‘‘real-worldproblem’’ into a statistical exercise, but a true col-laborative experience in which the statisticianbrings one of the critical components. Technicalexpertise and creative ability are, of course, ex-tremely important here, but of equal importance isthe ability and willingness to see major problemsfrom a broader perspective. Our early consulting

Ž .experiences at Hewlett Packard HP and the Uni-versity of Texas Health Sciences Center]San Anto-

Ž .nio UTHSCSA , respectively, were in environ-ments where the number of statisticians was smallrelative to other research staff, but where opportu-nities to contribute to the planning and execution ofimportant research programs were many and var-ied. One of the research and development labora-

Page 26: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.26

tory directors at HP once spoke to the company’s 50or so statisticians and started out by joking thathe’d never known a field that based its entire exis-tence on admitting what they don’t know}butthen expressed his admiration by saying, ‘‘I don’tthink there has ever been a group of professionalsat HP who has had such a major impact on thecompany in such a short period of time.’’

At the other extreme, some of these articles re-mind us how much more a statistician can con-tribute with deeper knowledge of the problem, par-ticularly the article by Broman, Speed and Tigges.David Byar once told his Biometry Branch that, forevery statistical methodology article, one needs toread about 10 substantive ones related to the sub-

Ž .ject matter in the field. Wangen 1990 warns,‘‘When we become remote from those who need ourhelp, we cannot maximize the contributions of ourprofessionals.’’ Gnanadesikan is quoted as saying,‘‘Most statisticians do not seem to become involveddeeply enough in subject matter areas to under-stand the scientific problems in their contexts’’Ž .Hoadley and Kettenring, 1990, p. 245 . Tweedieand Hall indirectly touch on one of the centralreasons why sufficient background knowledge isimportant: ‘‘As in all statistics applications, themodelling should come first . . . .’’ Of course, a trulyappropriate model and set of assumptions can bedetermined only after at least some, and usuallyconsiderable, understanding of the reality beingmodeled. The eventual value of any consultancydepends upon the statistician’s ability to identify amodel appropriate to the situation and a corre-spondingly appropriate analysis, neither of which ispossible without substantial understanding. A valu-able step in acquiring that understanding is a visitto the site where the data were or will be collected.Most of us have stories similar to Tweedie and Halland to Broman, Speed and Tigges about visiting alaboratory or office and discovering a key aspect ofthe problem that either answered the client’s ques-tion of interest or else revealed particular aspectsthat needed to be taken into account in any statisti-cal recommendation or advice.

The open discussion at the WNAR meeting inPullman highlighted further essential ingredientsfor successful consulting. Taylor demonstrates thevalue of nailing down the objectives of the studyright from the start so that the study can be de-signed to achieve them as efficiently as possible.Some experienced investigators arrive at the statis-tician’s office with carefully thought out, detailedquestions and study goals in mind}but manydo not, like the tax office clients of Tweedie andHall. William G. Hunter taught generations of stat-istical consultants at the University of Wisconsin]

Madison the importance of asking the right ques-tions: ‘‘At the outset the most important questionfor the statistician to ask is: What is the objective ofthis investigation?’’ He went on to describe a ses-sion with two investigators who spent 45 minutesdiscussing this question, and, ‘‘when it ended, theyagreed on what it was they were about. They there-upon said that I had been most helpful, and we said

Ž .goodbye’’ Hunter, 1981, p. 73 .Similarly, collecting the right data is important,

as Taylor describes in her article, to answer themain questions of interest. As a related point aboutdata, David Byar once said, ‘‘Better an imprecisemeasure of something important than a precisemeasure of something unimportant.’’ Often, a stat-istician’s primary contribution is to bring clients torecognize that their elaborate measurements wereonly tangentially related to the question of interest,and they might be better off investing a little timein collecting the relevant data. Hoeting acknowl-edges the possibility of this problem in her study:‘‘The mean daily discharge may not be a goodmeasure of the water release pattern, because twovery different water release patterns could have thesame mean daily discharge.’’ The first sentence inTukey’s Exploratory Data Analysis is ‘‘It is impor-tant to understand what you CAN DO before youlearn to measure how WELL you seem to have

Ž .DONE it’’ Tukey, 1977, p. 6 . The reality of whatyou really CAN DO within the context of availabledata can occasionally frustrate the investigator, es-pecially when the data have been expensive ordifficult to collect. Hoeting’s clients had generated‘‘the best data set ever obtained,’’ but it could not

Žprovide a complete answer. Even here or perhaps.especially , the statistician can play a critical role,

by explaining why this is so, participating in plan-ning activities and helping the investigator avoidthe strong temptation to claim more than can beobjectively justified based on the limited data inhand. Even when the data are more complete withrespect to the questions being asked, this last pointis often operative. One principal investigator of alarge study at UTHSCSA insisted that the consult-ing statistician had to function as the ‘‘scientist’sconscience’’ when it was time to report the resultsof a study, because good scientists must continuallythink about potential interpretations of their datawhich are well beyond what can be honestly called‘‘current results.’’

One of Hoeting’s ‘‘basic rules for statistical con-w xsultants,’’ namely, ‘‘ A lways check the data for er-

rors at the start of the project,’’ is well known. JimFilliben at the National Institute of Standards andTechnology analyzed data from the Department ofTransportation’s Daylight Savings Time Study of

Page 27: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 27

the number of traffic and pedestrian accidents dur-ing Nixon’s extended daylight savings time edictaimed at saving energy in 1973. The congression-ally mandated study resulted in data with so manyerrors that conclusive evidence concerning changesin the number of accidents could not be confirmed.

Ž .Again, quoting Tukey 1977, p. 10 : ‘‘One thing weregretfully learn about work with numbers is theneed for checking. Late-caught errors make forpainful repetition of steps we thought finished.Checking is inevitable; yet, if too extensive, wespend all our time getting the errors out of thechecks. Our need is for enough checks but not toomany.’’

Other consultants have also noted before Hoetingthat ‘‘we must guard ourselves against standing onthe ‘statistician’s pedestal’ from which we lecturescientists on the limitations of their studies. . . . Weshould recognize that, just as statisticians makecompromises while doing analyses, investigators areunder considerable constraints when designingtheir studies.’’ Many years ago, Lincoln Moses toldthe students in his experimental design course,‘‘Now, it is always nice to have a balanced design.But if your experiment isn’t balanced, don’t throwup your hands and go home! There are ways toanalyze it. And I’m going to show you how.’’ As a

Ž .client of statisticians, Wangen 1990, p. 273 sug-w xgests, ‘‘ Statisticians should do what they can to

help, regardless of personal opinions about whatcould have been done, and refrain from comment-ing negatively on aspects of the work performedbefore their involvement unless invited to do so.Good statisticians can nearly always provide usefulassistance at any stage of a project.’’ CompareTweedie’s comment: ‘‘Statistics can contributesomething that was not there previously, and wehave much to offer to almost everyone.’’

Another important principle for successful con-sulting is the statistician’s use of the client’s lan-guage, as do Broman, Speed and Tigges, ratherthan the other way around: ‘‘To be successful, wemust learn to serve. That, of course, requires statis-ticians to get to know the language and problems ofw x Ž . Žthe clients ’’ Hunter, 1990, p. 261 . As a graduate

Ž .student, during a spring picnic, I Kafadar onceasked John Tukey a philosophical question aboutstatistical practice. The statistical concepts in hiscourses used to challenge both students and profes-sors alike, so the simplicity of the reply made a realimpression on me: he once had a client who was amedical doctor at Sloan Kettering, and he told me,‘‘It was nearly a year after our first meeting that Ieven came so close as to mentioning a t-test to

.him.’’ But probably the most important lesson forstatisticians is to avoid the proverbial ‘‘error of the

third kind,’’ that is, providing the right answer tothe wrong question: ‘‘Far better an approximateanswer to the right question, which is often vague,than an exact answer to the wrong question, which

Žcan always be made precise’’ Tukey, 1962, pp..13]14 .

The primary benefit to statisticians of seriouscreative participation in consulting, in addition tothe sense of satisfaction by contributing a solutionto a real problem, is the potential for advancingstatistical research. This potential has been recog-nized by many, including past ASA PresidentJerome Cornfield: ‘‘Application requires under-standing, and the search for understanding oftenleads to, and cannot be distinguished from, re-search. The true joy is to see the breadth of applica-tion and the breadth of understanding grow to-gether, with the unplanned fallout}the pure gravy,so to speak}being the new research finding’’Ž . Ž .Cornfield, 1975, p. 11 . Box 1984 lists severalimportant interactions between practice and theorywhere practical problems stimulated theoretical de-velopment of whole new areas: small samples ª

Ž .Student’s t Gosset ; rainfall data at Rothamsteadª distributed lag models and orthogonal polynomi-

Ž .als Fisher ; agriculture experiments ª factorialŽ .designs and confounding Yates ; balancing several

Ž .factors at once ª Youden squares Youden ; largedata sets on telephone usage ª exploratory data

Ž .analysis Tukey . Of course, major new statisticalideas and methodologies such as these are often notborn of one or a few consulting problems, but gener-ally develop gradually in response to experiencegained by repeatedly thinking about applications ina statistical context. For example, computerizeddata collection systems such as the one Taylor de-scribes have revolutionized the kind and quantityof data available in many applications areas, andconsulting on such problems can stimulate researchin the critically important area of analysis of largeand complex data structures. The connection be-tween important improvements in methodology andparticipation in applications is clear and is docu-mented by, for example, the IMS Panel on Cross-Disciplinary Research in the Statistical Sciences,which identified many areas where ‘‘statisticianshave achieved signal advances in theory and meth-ods as they worked on applications in many fields,and, in turn, statistical thinking and methodologyhave greatly influenced the development of virtu-

Ž .ally all areas of science’’ IMS Panel, 1990, p. 121 ,including agriculture, health, military operations

Žand transportation and communication systems pp.. Ž137]138 . They identify ‘‘Type A’’ rather routine

. Žconsultancies and ‘‘Type B’’ full-fledged collobora-.tions interactions, with the hope that Type A ones

Page 28: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

TWEEDIE ET AL.28

evolve into Type B ones, with the help of neededresources and increased supporting infrastructureŽsuch as the development of the National Institutefor Statistical Sciences that followed the publica-

.tion of this Panel Report .Despite this opportunity for advancing research

through consulting, many academic programs donot actively encourage it. Tribus noted years ago,with probably little change since then, ‘‘studentsbelieve that professional statisticians are presentedwith well-formulated problems, which appear overthe transom and for which they are to provideclever solutions that are exchanged for tokens ofappreciation that have great value in the market-place. They have been brought up on a diet of ‘giventhis, find that’}usually with the understandingthat the method to be used is the one taught in thelast class. Unfortunately, the world does not ‘give’

Žproblems}you have to go and get them’’ Tribus,. Ž .1990, p. 271 . Hogg 1991 also notes that ‘‘We do

not encourage enough teamwork, with studentsŽ .working on projects’’ p. 342 ; ‘‘How often do our

Ph.D. students understand the importance of theseideas and develop their communication skills so asto be effective in the classroom or in consulting?’’Ž . Ž .p. 343 . In an earlier article, Tweedie 1992 like-wise lamented that the report of the New Re-

Ž .searchers Committee of the IMS 1991 did little todirect new researchers into important areas, advis-ing instead that ‘‘ ‘unless you need the data analy-sis experience, your role is to dispense advice’ in a

Ž .consulting context’’ Tweedie, 1992, p. 264 . Even asrecently as August 1996, several statisticians at ameeting in Halifax admitted that, while collabora-tion and joint authorship can be enormously benefi-cial to science, they as faculty members would dis-courage untenured faculty from anything other thanindependent, sole-authored papers in prestigiousstatistics journals. Fortunately, the Editors of thisjournal have chosen to encourage statistical con-sulting by publishing this set of articles that serveto illustrate the important role that statistics canplay both in advancing the science in the field ofapplication, as well as providing background andmotivation for future statistical research.

This discouragement of collaboration might bereduced if journals were to publish more articlesdemonstrating creative methodology motivated bychallenging consulting problems. A few journalsalready emphasize applications in their editorial

Žpolicies: for example, Technometrics ‘‘adequatejustification of the application of the technique,preferably by means of an actual application to a

. Žproblem’’ , Statistics in Medicine ‘‘The journal willpublish papers on practical applications of statisticsand other quantitative methods to medicine and its

. Žapplied science’’ , Biometrics ‘‘describing and ex-emplifying developments in these methods and theirapplications in a form readily assimilable by experi-

.mental scientists’’ , JASA Applications and CaseŽStudies ‘‘For real data sets, present analyses that

are statistically innovative as well as scientificallyand practically relevant. . . . Using empirical tests,examine or illustrate for an important application

.the utility of a valuable statistical technique’’ andŽJASA Theory and Methods ‘‘The research reported

should be motivated by a scientific or practicalproblem and, ideally, illustrated by application ofthe proposed methodology to that problem. Illustra-tion of techniques with real data is especially wel-

.comed and strongly encouraged’’ . While a routinestatistical analysis does not fall into these cate-gories, a creative analysis that offers a novel ap-proach to the problem often would. The recognitionof the value of consulting as a vehicle for advancingstatistical research should be widened through thepublication of such articles. The Editors of thisjournal have taken some further steps toward thisend.

ADDITIONAL REFERENCES

Ž .BACCHETTI, P. 1990 . Estimating the incubation period of AIDSby comparing population infection and diagnosis patterns.J. Amer. Statist. Assoc. 85 1002]1008.

Ž .BOX, G. E. P. 1984 . The importance of practice in the develop-ment of statistics. Technometrics 26 1]8.

Ž .CORNFIELD, J. 1975 . A Statistician’s Apology. J. Amer. Statist.Assoc. 70 7]14.

Ž .EFRON, B. and FELDMAN, D. 1991 . Compliance as an explana-Ž .tory variable in clinical trials with discussion . J. Amer.

Statist. Assoc. 86 9]26.Ž .HAHN, G. J. 1990 . Commentary on ‘‘Communications between

statisticians and engineersrphysical scientists.’’ Technomet-rics 32 257]258.

Ž .HOADLEY, A. B. and KETTENRING, J. R. 1990 . Communicationsbetween statisticians and engineersrphysical scientistsŽ .with discussion . Technometrics 32 243]270.

Ž .HOGG, R. V. 1991 . Statistical education: improvements arebadly needed. Amer. Statist. 45 342]343.

Ž .HUNTER, J. S. 1990 . Commentary on ‘‘Communications be-tween statisticians and engineersrphysical scientists.’’ Tech-nometrics 32 261.

Ž .HUNTER, W. G. 1981 . The practice of statistics: the real world isan idea whose time has come. Amer. Statist. 35 72]75.

IMS PANEL ON CROSS-DISCIPLINARY RESEARCH IN THE STATISTICAL

Ž .SCIENCES 1990 , Cross-disciplinary research in the statisti-cal sciences, Statist. Sci. 5 121]146.

Ž .MOORE, D. S. 1990 . Commentary on ‘‘Communications betweenstatisticians and engineersrphysical scientists.’’ Technomet-rics 32 265]266.

Ž .MOSTELLER, F. and PARUNAK, A. 1985 . Identifying extreme cellsin a sizable contingency table: probabilistic and exploratoryapproaches. In Exploring Data Tables, Trends, and Shapes189]224. Wiley, New York.

Ž .NEW RESEARCHERS COMMITTEE OF THE IMS 1991 . Meeting theneeds of new statistical researchers, Statist. Sci. 6 163]174.

TANUR, J., MOSTELLER, F., KRUSKAL, W. H., LEHMANN, E. L.,

Page 29: Statistical Science 13, No 1, 1 29 Consulting: Real ...kbroman/publications/statsci.pdfstatistical consulting. Biometrics 33 564]565. ZAHN, D.A. and ISENBERG J. 1980.. Non-statistical

CONSULTING 29

Ž .LINK, R. F., PIETERS, R. S. and RISING, G. R. 1989 . Statis-tics: A Guide to the Unknown, 3rd ed. Wadsworth, Belmont,CA.

Ž .TRIBUS, M. 1990 . Comment on ‘‘Communications betweenstatisticians and engineersrphysical scientists.’’ Technomet-rics 32 271]272.

Ž .TWEEDIE, R. L. 1992 . Comment on IMS New ResearchersReport. Statist. Sci. 6 263]264.

Ž .TUKEY, J. W. 1962 . The future of data analysis. Ann. Math.

wStatist. 33 1]67. Reprinted in The Collected Works of JohnW. Tukey 4. Philosophy and Principles of Data Analysis,

Ž .1949]1964 L. V. Jones, ed. 391]484. Wadsworth, Belmont,xCA.

Ž .TUKEY, J. W. 1977 . Exploratory Data Analysis. Addison-Wes-ley, Reading, MA.

Ž .WANGEN, L. E. 1990 . Comment on ‘‘Communications betweenstatisticians and engineersrphysical scientists.’’ Technomet-rics 32 273]274.