teaching ‘instant experience’ with graphical model validation techniques

Original Article

Teaching ‘Instant Experience’ with GraphicalModel Validation Techniques
Claus Thorn Ekstrøm Department of Biostatistics, University of Southern Denmark, Odense, Denmark e-mail: [email protected]
Abstract

© 2013 The AuthorsTeaching Statistics © 2013 Te

Graphical model validation techniques for linear normal models are often used tocheck the assumptions underlying a statistical model. We describe an approach toprovide ‘instant experience’ in looking at a graphical model validation plot, so itbecomes easier to validate if any of the underlying assumptions are violated.

Keywords:
Teaching; Wally plot; Model validation; Residual plot; qq-plot; Sampling from thenull hypothesis.
INTRODUCTION

Statistical model validation is a vital step in themodel building process since it allows the investi-gator to have confidence in the inference obtained.Thus, model validation should be an essential partof a statistical course curriculum not just becausewe only want students to report results from amodel they believe in but also because the modelvalidation step forces the students to be aware thatthe statistical model is in fact a model.

Diagnostic plots like standardized residual, com-ponent-plus-residual and quantile–quantile plotsoffer information on various aspects of the modelfit and provide a way to check for correct meanspecification, heteroscedasticity, normality andinfluential observations for linear normal models.When teaching model validation, however, manystudents find it daunting to base the modelvalidation on graphical techniques because theconclusions about the model fit require anelement of subjectivity that is ‘hidden’ in moreinference-based approaches like the Shapiro–Wilk’s, Lilliefors–Kolmogorov–Smirnov’s, Cramer-von Mises and Anderson-Darling’s goodness-of-fittests. Just like their graphical model validationcounterparts, these inference-based approachesexamine various properties of the residuals of amodel fit and — from a student point-of-view —

have the nice property that they give a perceiveddefinitive answer of whether the model fits: ap-value less than 5% means the model validationfailed.

Even when the point is pressed home that theclassical 5% significance level should not be

aching Statistics Trust, 36, 1, pp 23–26

considered set in stone and it is discussed howthe various inference-based model validation testsindirectly result in a multiple testing problem (e.g.,should all four of the tests mentioned above passthe normality test? Just one?), the students stillprefer those to the graphical model validationtechniques despite the fact that some model fitdiscrepancies are easier detected by, say, a resid-ual plot. After the students have been questioned,the overwhelming reason stated for preferring theinference-based model validation tests is that theyfeel they lack the experience to look at the graphi-cal plots and determine when something is wrong.

Here, we present a suggestion that may helpease the interpretation of any graphical modelvalidation plot by giving the students ‘instant ex-perience’ in looking at variations among graphicalmodel validation plots when the null hypothesis isindeed true. We simulate a number of validationplots from a setup where the model fits. The orig-inal plot from the model and the simulated plotsare randomized, and if none of the plots standout or if the original validation plot is not the worstfit then, we conclude that the model fit is fine.

Graphical model validation plots

The use of graphical model diagnostics is hardlynew, and many introductory textbooks presentvarious techniques. For linear normal and general-ized linearmodels, graphical model diagnostics aretypically based on the model fit residuals andAtkinson 1987, Cook 1998 and Fox 1991 discussvarious graphical techniques ranging from histo-grams of residuals over quantile–quantile plots to

23

24

residual and component-plus-residual plots, andwerefer to these texts for further information about theadvantages and disadvantages of these methods.Lin et al. 2002 base theirmodel validation on a com-bined numerical and graphical approach by basingtheir goodness-of-fit assessment on the cumulativeresiduals.

Graphical methods address several aspects ofthe relationship between the model and the data.In the following,wewill primarily focus on the stan-dardized residual plot, but the idea is essentiallyapplicable to any graphical validation technique.

Simulating instant experience

If we wish to evaluate the model validation plot,then we need to determine if it is substantiallydifferent from what might be expected if themodel indeed fitted the data. One way to provide‘instant experience’ is by creating a Wally plot(Handford, 2007): a sequence of model validationplotswhere one of the plots is the actual plot obtainedfrom the model fit, and the rest are simulated plotsfrom a situation where the model fits. The order ofthe plots is randomized, so the position of the originalplot is unknown to the student at this point.

If the student finds that one of the plots appear tostand out compared with the others and it subse-quently turns out that that plot is in fact the actualvalidationplot from themodelfit, then that suggeststhat the fit is unlike what would be expected andshould give rise to alarm. If none of the plots standoutor if thestudentpicks thewrongplot, thenclearly,the model fit is no worse than would be expected.

We can assign a valid p-value to the result if theright validation plot is chosen. If we simulate n plotsfrom the null hypothesis where the model fits thedata and assume that the original model validationplot is also drawn from the same null hypothesis,thenap-value of 1/(n+1) canbeassigned to the re-sult. However, a large number of simulated plotsmaybenecessary to be able to achieve a sufficientlysmall p-value for this calculation to be worthwhile.

In practice,we suggest to generate a 3×3grid ofnine validation plots in one figure — the originalresidual plot is placed in a random position on thegrid, whereas eight simulated plots from the nullhypothesis are placed on the remaining locations.For example, if we consider a standardized residualplot, then we simulate from the null hypothesis ofindependent and normally distributed residualsand create corresponding residual plots in order toshow how varying the plots might be. The residualplots should be created using the same predictedvalues as for the original data since that ensuresthat the density of the observations in the predicted

range becomes the same for all plots. When thesame predicted values are used for all the plots,we also make sure that the values obtained arerealistic so that is not the cause of perceived differ-ences among the plots.

An example of a Wally plot is shown in Figure 1.The residual plot corresponding to the true modelvalidation plot is – unbeknownst to the student –shown in the lower left corner. Note that all y axesand x axes have the same ranges, respectively,for ease of comparison among plots.

By showing theoriginal residual plot togetherwitheight corresponding residual plots from the nullhypothesis, we see that the variation in size of theresiduals is more or less the same in all nine plots,so variance heterogeneity is unlikely to be an issuewith themodelfit. Noneof the nine subplots suggestan excessive amount of negative or positive out-liers, but the plot in the lower left corner might giverise to concern about the mean trend: it appearsas if the residuals in this plot arise from amodel thatunderestimates the mean for small and large pre-dicted values. Once it is revealed that the lower leftsubplot is indeed the residual plot from the originaldata, we immediately have an indication that themodel fit is unlike what we would expect.

DISCUSSION

The idea of using simulated dataset from the nullhypothesis to provide extra information about amodel fit is hardly new. In Atkinson 1987, a similarapproach is used to provide point-wise confidenceintervals for quantile–quantile plots, and the sameidea is used in Lin et al. 2002 where simulatedcumulative residual processes are used to createprediction intervals under the null hypothesis. Weuse a slightly different approach where it is easierto observe the variation among plots when themodelfits thedata.Wehope that our suggestionwillhelp students regard graphical and numericalmodelvalidation techniques as complementary methodsthat address different aspects about the model fit,and that one method is not preferable to anotherbecause it apparently is easier to reacha conclusion.

The real problem, of course, is the way somestudents focus solely on hypothesis testing andhow the ubiquitous 5% significance level may ap-pear to be an easyway to reach a quick and definiteconclusion about the model validation. AWally plotmay initiate a discussion about these issues – notjust by providing a way to evaluate the evidenceagainst the model assumptions – but because itnaturally links to a discussion of how we actually

© 2013 The AuthorsTeaching Statistics © 2013 Teaching Statistics Trust, 36, 1, pp 23–26

Fig. 1. Wally plot using standardized residual plots. One of the nine residual plots is from the true model fit –the remaining eight are from simulated data where the residuals are truly normal. Each subplot has a dashedline to show the average trend in residuals and a coloured area that indicates a local estimate of twice thestandard deviation.

25

do hypothesis testing: we reject a null hypothesis ifthe statistic/plot based on the observed data is sofar from what we would expect if the null hypothe-sis is true. As mentioned above, we can also usethe Wally plot randomization as a way to discussthe definition and calculation of p-values.

Software and source code

The Wally plot has been implemented in the Rpackage MESS, which is available from cran.

r-project.org. The augmented residual plotsare also available in the MESS package but arenot necessary for creating a Wally plot.

An annotated example code (that will produceFigure 1 from thismanuscript) is shown below. Thisfollowing example first example produces a singleresidual plot from a multiple linear regressionmodel fit where the volume of cherry trees ismodelled as a function of their diameter and height.


Then it produces a Wally plot of the samemodel fit corresponding to Figure 1. The loca-tion of the actual residual plot from the modelfit is placed in a random position on the 3×3grid and is first revealed after the user pressesa key.

If we want to plot something other than a resid-ual plot to use for model validation, we can specifythe FUN argument to wallyplot. FUN can be anyfunction that accepts an x and y argument andcreates the desired plot. If, for example, we wouldlike qq-plot instead we just create the desiredplotting function.

The idea can be extended even further. A com-ponent-plus-residual plot overlayed with a lowessline can be created as shown below, where wealso set the simulateFunction to ensure thatthe value of the component is added to the simu-lated residuals

26

The same code can then be run with Height inplace of Girth to produce the other component-plus-residual plot.

REFERENCES

Atkinson, A.C. (1987). Plots, Transformations, andRegression. An Introduction toGraphicalMethodsof Diagnostic RegressionAnalysis. OxfordUniver-sity Press, New York.

Cook, D. (1998). Regression Graphics: Ideasfor Studying Regressions Through Graphics.Wiley, New York.

Fox, J. (1991). Regression Diagnostics: An Intro-duction. SAGE Publications, Newbury Park.

Handford, M. (2007).Where’sWally?Walker BooksLtd, London.

Lin, D.,Wei, L. and Ying, Z. (2002). Model-checkingtechniques based on cumulative residuals.Biometrics, 58, 1–12.


teaching ‘instant experience’ with graphical model validation techniques

Documents