appendixa basicstatisticalconceptsforsensoryevaluation978-1-4419-6488-5/1.pdf · appendixa...

124
Appendix A Basic Statistical Concepts for Sensory Evaluation Contents A.1 Introduction ................ 473 A.2 Basic Statistical Concepts .......... 474 A.2.1 Data Description ........... 475 A.2.2 Population Statistics ......... 476 A.3 Hypothesis Testing and Statistical Inference .................. 478 A.3.1 The Confidence Interval ........ 478 A.3.2 Hypothesis Testing .......... 478 A.3.3 A Worked Example .......... 479 A.3.4 A Few More Important Concepts .... 480 A.3.5 Decision Errors ............ 482 A.4 Variations of the t-Test ............ 482 A.4.1 The Sensitivity of the Dependent t-Test for Sensory Data ............. 484 A.5 Summary: Statistical Hypothesis Testing ... 485 A.6 Postscript: What p-Values Signify and What They Do Not ................ 485 A.7 Statistical Glossary ............. 486 References .................... 487 It is important when taking a sample or designing an experiment to remember that no matter how powerful the statistics used, the inferences made from a sample are only as good as the data in that sample. ... No amount of sophisticated statistical analysis will make good data out of bad data. There are many scien- tists who try to disguise badly constructed experiments by blinding their readers with a complex statistical analysis. —O’Mahony (1986, pp. 6, 8) This chapter provides a quick introduction to statistics used for sensory evaluation data including measures of central tendency and dispersion. The logic of statisti- cal hypothesis testing is introduced. Simple tests on pairs of means (the t-tests) are described with worked examples. The meaning of a p-value is reviewed. A.1 Introduction The main body of this book has been concerned with using good sensory test methods that can generate quality data in well-designed and well-executed stud- ies. Now we turn to summarize the applications of statistics to sensory data analysis. Although statistics are a necessary part of sensory research, the sensory scientist would do well to keep in mind O’Mahony’s admonishment: statistical analysis, no matter how clever, cannot be used to save a poor experiment. The techniques of statistical analysis, do however, serve several useful purposes, mainly in the efficient summarization of data and in allowing the sensory scientist to make reasonable conclusions from the information gained in an experiment. One of the most important conclusions is to help rule out the effects of chance variation in producing our results. “Most people, including scientists, are more likely to be con- vinced by phenomena that cannot readily be explained by a chance hypothesis” (Carver, 1978, p. 387). Statistics function in three important ways in the analysis and interpretation of sensory data. The first is the simple description of results. Data must be summa- rized in terms of an estimate of the most likely values to represent the raw numbers. For example, we can describe the data in terms of averages and standard 473 H.T. Lawless, H. Heymann, Sensory Evaluation of Food, Food Science Text Series, DOI 10.1007/978-1-4419-6488-5, © Springer Science+Business Media, LLC 2010

Upload: vuminh

Post on 24-Feb-2018

237 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix A

Basic Statistical Concepts for Sensory Evaluation

Contents

A.1 Introduction . . . . . . . . . . . . . . . . 473A.2 Basic Statistical Concepts . . . . . . . . . . 474

A.2.1 Data Description . . . . . . . . . . . 475A.2.2 Population Statistics . . . . . . . . . 476

A.3 Hypothesis Testing and StatisticalInference . . . . . . . . . . . . . . . . . . 478A.3.1 The Confidence Interval . . . . . . . . 478A.3.2 Hypothesis Testing . . . . . . . . . . 478A.3.3 A Worked Example . . . . . . . . . . 479A.3.4 A Few More Important Concepts . . . . 480A.3.5 Decision Errors . . . . . . . . . . . . 482

A.4 Variations of the t-Test . . . . . . . . . . . . 482A.4.1 The Sensitivity of the Dependent t-Test for

Sensory Data . . . . . . . . . . . . . 484A.5 Summary: Statistical Hypothesis Testing . . . 485A.6 Postscript: What p-Values Signify and What

They Do Not . . . . . . . . . . . . . . . . 485A.7 Statistical Glossary . . . . . . . . . . . . . 486References . . . . . . . . . . . . . . . . . . . . 487

It is important when taking a sample or designing anexperiment to remember that no matter how powerfulthe statistics used, the inferences made from a sampleare only as good as the data in that sample. . . . Noamount of sophisticated statistical analysis will makegood data out of bad data. There are many scien-tists who try to disguise badly constructed experimentsby blinding their readers with a complex statisticalanalysis.

—O’Mahony (1986, pp. 6, 8)

This chapter provides a quick introduction to statisticsused for sensory evaluation data including measures ofcentral tendency and dispersion. The logic of statisti-cal hypothesis testing is introduced. Simple tests onpairs of means (the t-tests) are described with workedexamples. The meaning of a p-value is reviewed.

A.1 Introduction

The main body of this book has been concerned withusing good sensory test methods that can generatequality data in well-designed and well-executed stud-ies. Now we turn to summarize the applications ofstatistics to sensory data analysis. Although statisticsare a necessary part of sensory research, the sensoryscientist would do well to keep in mind O’Mahony’sadmonishment: statistical analysis, no matter howclever, cannot be used to save a poor experiment.The techniques of statistical analysis, do however,serve several useful purposes, mainly in the efficientsummarization of data and in allowing the sensoryscientist to make reasonable conclusions from theinformation gained in an experiment. One of the mostimportant conclusions is to help rule out the effectsof chance variation in producing our results. “Mostpeople, including scientists, are more likely to be con-vinced by phenomena that cannot readily be explainedby a chance hypothesis” (Carver, 1978, p. 387).

Statistics function in three important ways in theanalysis and interpretation of sensory data. The first isthe simple description of results. Data must be summa-rized in terms of an estimate of the most likely valuesto represent the raw numbers. For example, we candescribe the data in terms of averages and standard

473H.T. Lawless, H. Heymann, Sensory Evaluation of Food, Food Science Text Series,DOI 10.1007/978-1-4419-6488-5, © Springer Science+Business Media, LLC 2010

Page 2: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

474 Appendix A

deviations (a measure of the spread in the data). This isthe descriptive function of statistics. The second goalis to provide evidence that our experimental treatment,such as an ingredient or processing variable, actuallyhad an effect on the sensory properties of the product,and that any differences we observe between treat-ments were not simply due to chance variation. Thisis the inferential function of statistics and provides akind of confidence or support for our conclusions aboutproducts and variables we are testing. The third goalis to estimate the degree of association between ourexperimental variables (called independent variables)and the attributes measured as our data (called depen-dent variables). This is the measurement function ofstatistics and can be a valuable addition to the normalsensory testing process that is sometimes overlooked.Statistics such as the correlation coefficient and chi-square can be used to estimate the strength of relation-ship between our variables, the size of experimentaleffects, and the equations or models we generate fromthe data.

These statistical appendices are prepared as a gen-eral guide to statistics as they are applied in sensoryevaluation. Statistics form an important part of theequipment of the sensory scientist. Since most eval-uation procedures are conducted along the lines ofscientific inquiry, there is error in measurement and aneed to separate those outcomes that may have arisenfrom chance variation from those results that are due toexperimental variables (ingredients, processes, pack-aging, shelf life). In addition, since the sensory scien-tist uses human beings as measuring instruments, thereis increased variability compared to other analyticalprocedures such as physical or chemical measurementsdone with instruments. This makes the conduct of sen-sory testing especially challenging and makes the useof statistical methods a necessity.

The statistical sections are divided into separatetopics so that readers who are familiar with someareas of statistical analysis can skip to sections ofspecial interest. Students who desire further expla-nation or additional worked examples may wishto refer to O’Mahony (1986), Sensory Evaluationof Foods, Statistical Methods and Procedures. Thebooks by Gacula et al. (2009), Statistical Methods inFood and Consumer Research, and Piggott (1986),Statistical Procedures in Food Research, contain infor-mation on more complex designs and advanced topics.This appendix is not meant to supplant courses in

statistics, which are recommended for every sensoryprofessional.

It is very prudent for sensory scientists to maintainan open dialogue with statistical consultants or otherstatistical experts who can provide advice and sup-port for sensory research. This advice should be soughtearly on and continuously throughout the experimen-tal process, analysis, and interpretation of results. R.A. Fisher is reported to have said, “To call the statisti-cian after the experiment is done may be no more thanasking him to perform a postmortem examination: hemay be able to tell you what the experiment died of”(Fisher, Indian Statistical Congress, 1938). To be fullyeffective, the sensory professional should use statisticalconsultants early in the experimental design phase andnot as magicians to rescue an experiment gone wrong.Keep in mind that the “best” experimental design for aproblem may not be workable from a practical point ofview. Human testing can necessarily involve fatigue,adaptation and loss of concentration, difficulties inmaintaining attention, and loss of motivation at somepoint. The negotiation between the sensory scientistand the statistician can yield the best practical result.

A.2 Basic Statistical Concepts

Why are statistics so important in sensory evaluation?The primary reason is that there is variation or errorin measurement. In sensory evaluation, different par-ticipants in a sensory test simply give different data.We need to find the consistent patterns that are not dueto chance variation. It is against this background ofuncontrolled variation that we wish to tell whether theexperimental variable of interest had a reliable effecton the perceptions of our panelists. Unfortunately, thevariance in our measurements introduces an elementof risk in making decisions. Statistics are never com-pletely foolproof or airtight. Decisions even under thebest conditions of experimentation always run the riskof being wrong. However, statistical methods help usto minimize, control, and estimate that risk.

The methods of statistics give us rules to estimateand minimize the risk in decisions when we general-ize from a sample (an experiment or test) to the greaterpopulation of interest. They are based on considera-tion of three factors: the actual measured values, theerror or variation around the values, and the number

Page 3: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix A 475

of observations that are made (sometimes referred toas “sample size,” not to be confused with the size of afood sample that is served). The interplay of these threefactors forms the basis for statistical calculations in allof the major statistical tests used with sensory data,including t-tests on means, analysis of variance, andF-ratios and comparisons of proportions or frequencycounts. In the case of t-test on means, the factors are(1) the actual difference between the means, (2) thestandard deviation or error inherent in the experimen-tal measurement, and (3) the sample size or number ofobservations we made.

How can we characterize variability in our data?Variation in the data produces a distribution of valuesacross the available measurement points. These distri-butions can be represented graphically as histograms.A histogram is a type of graph, a picture of frequencycounts of how many times each measurement point isrepresented in our data set. We often graph these datain a bar graph, the most common kind of histogram.Examples of distributions include sensory thresholdsamong a population, different ratings by subjects on asensory panel (as in Fig. A.1), or judgments of productliking on a 9-point scale across a sample of consumers.In doing our experiment, we assume that our mea-surements are more or less representative of the entirepopulation of people or those who might try our prod-uct. The experimental measurements are referred toas a sample and the underlying or parent group as apopulation. The distribution of our data bears someresemblance to the parent population, but it may dif-fer due to the variability in the experiment and error inour measuring.

Rating on a 15-point scale

Freq

uenc

y(n

umbe

r of

res

pond

ents

)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Fig. A.1 A histogram showing a sample distribution of datafrom a panel’s ratings of the perceived intensity of a sensorycharacteristic on a 15-point category scale.

A.2.1 Data Description

How do we describe our measurements? Consider asample distribution, as pictured in Fig. A.1. Thesemeasurements can be characterized and summarizedin a few parameters. There are two important aspectswe use for the summary. First, what is the best singleestimate of our measurement? Second, what was thevariation around this value?

Description of the best or most likely single valueinvolves measures of central tendency. Three are com-monly used: the mean is commonly called an averageand is the sum of all data values divided by the numberof observations. This is a good representation of thecentral value of data for distributions that are symmet-ric, i.e., not too heavily weighted in high or low values,but evenly dispersed. Another common measure is themedian or 50th percentile, the middle value when thedata are ranked. The median is a good representation ofthe central value even when the data are not symmetri-cally distributed. When there are some extreme valuesat the high end, for example, the mean will be undulyinfluenced by the higher values (they pull the averageup). The median is simply the middle value after themeasurements are rank ordered from lowest to highestor the average of the two middle values when there isan even number of data points. For some types of cat-egorical data, we need to know the mode. The modeis the most frequent value. This is appropriate whenour data are only separated into name-based categories.For example, we could ask for the modal response tothe question, when is the product consumed (breakfast,lunch, dinner, or snack)? So a list of items or responseswith no particular ordering to the categories can besummarized by the most frequent response.

The second way to describe our data is to lookat the variability or spread in our observations. Thisis usually achieved with a measure called the stan-dard deviation. This specifies the degree to which ourmeasures are dispersed about the central value.

The standard deviation of such an experimentalsample of data (S) has the following form:

S =√∑N

i=1 (Xi − M)2

N − 1(A.1)

where M = mean of X scores = (� X)/N.

Page 4: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

476 Appendix A

The standard deviation is more easily calculated as

S =√∑N

i=1 X2i − ((�X)2

/N)

N − 1(A.2)

Since the experiment or sample is only a small rep-resentation of a much larger population, there is atendency to underestimate the true degree of variationthat is present. To counteract this potential bias, thevalue of N–1 is used in the denominator, forming whatis called an “unbiased estimate” of the standard devia-tion. In some statistical procedures, we do not use thestandard deviation, but its squared value. This is calledthe sample variance or S2 in this notation.

Another useful measure of variability in the data isthe coefficient of variation. This weights the standarddeviation for the size of the mean and can be a goodway to compare the variation from different methods,scales, experiments, or situations. In essence the mea-sure becomes dimensionless or a pure measure of thepercent of variation in our data. The coefficient of vari-ation (CV) is expressed as a percent in the followingformula:

CV(%) = 100S

M(A.3)

where S is the sample standard deviation and M is themean value. For some scaling methods such as mag-nitude estimation, variability tends to increase withincreasing mean values, so the standard deviation byitself may not say much about the amount of error inthe measurement. The error changes with the level ofmean. The coefficient of variation, on the other hand,is a relative measure of error that takes into account theintensity value along the scale of measurement.

The example below shows the calculations of themean, median, mode, standard deviation, and coef-ficient of variation for data shown in Table A.1.

N = 41Mean of the scores = (�X)/N = (2 + 3 + 3 + 4 + . . . +

11 + 12 +13) / 41 = 7.049Median = middle score = 7Mode = most frequent score = 6Standard deviation = S

Table A.1 First data set, rank ordered

2 5 7 93 5 7 93 6 7 94 6 8 94 6 8 104 6 8 104 6 8 105 6 8 115 6 8 115 7 9 12

13

S =√∑N

i=1 X2i − ((�X)2

/N)

N − 1

=√

2, 303 − (83, 521)/41

40= 2.578

CV (%) = 100 (S/mean) = 100 (2.578/ 7.049) =36.6%.

A.2.2 Population Statistics

In making decisions about our data, we like to inferfrom our experiment to what might happen in the pop-ulation as a whole. That is, we would like our resultsfrom a subsample of the population to apply equallywell when projected to other people or other products.By population, we do not necessarily mean the pop-ulation of the nation or the world. We use this termto mean the group of people (or sometimes products)from which we drew our experimental panel (or sam-ples) and the group to which we would like to applyour conclusions from the study. The laws of statisticstell us how well we can generalize from our experiment(or sensory test) to the rest of the population of interest.The population means and standard deviations are usu-ally denoted by Greek letters, as opposed to standardletters for sample-based statistics.

Many things we measure about a group of peoplewill be normally distributed. That means the valuesform a bell-shaped curve described by an equationusually attributed to Gauss. The bell curve is symmet-ric around a mean value—values are more likely tobe close to the mean than far from it. The curve isdescribed by its parameters of its mean and its standarddeviation as shown in Fig. A.2. The standard deviation

Page 5: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix A 477

Z

–3 –2 –1 0 +1 +2 +3

(marks off equal standard deviation units)

34% 34%

14% 14%

2% 2%

mean

1

Important properties of the normal distribution curve: 1) areas (under the curve) correspond to proportions of the population.

2) each standard deviation subsumes a known proportion

3) Since proportions are related to probabilities, we know how likely or unlikely certain values are going to be. Extreme scores (away from the mean) are rare or improbable.

Fig. A.2 The normaldistribution curve is describedby its parameters of its meanand its standard deviation.Areas under the curve markoff discrete and knownpercentages of observations.

of a population, σ , is similar to our formula for thesample standard deviation as is given by

σ =√∑N

i=1 (Xi − μ)2

N(A.4)

whereX = each score (value for each person, product); μ

= population mean; N = number of items in popula-tion.

How does the standard deviation relate to the nor-mal distribution? This is an important relationship,which forms the basis of statistical risk estimation andinferences from samples to populations. Because weknow the exact shape of the normal distribution (givenby its equation), standard deviations describe knownpercentages of observations at certain degrees of dif-ference from the mean. In other words, proportionsof observations correspond to areas under the curve.Furthermore, any value, X, can be described in termsof a Z-score, which states how far the value is from themean in standard deviation units. Thus,

Z = X − μ

σ(A.5)

Z-scores represent differences from the mean valuebut they are also related to areas under the normalcurve. When we define the standard deviation as oneunit, the Z-score is also related to the area under thecurve to the left of right of its value, expressed as apercentage of the total area. In this case the z-scorebecomes a useful value to know when we want tosee how likely a certain observation would be andwhen we make certain assumptions about what thepopulation may be like. We can tell what percent ofobservations will lie a given distance (Z-score) fromthe mean. Because the frequency distribution actuallytells us how many times we expect different values tooccur, we can convert this z-score to a probability value(sometimes called a p-value), representing the areaunder the curve to the left or right of the Z-value. Instatistical testing, where we look for the rarity of calcu-lated event, we are usually examining the “tail” of thedistribution or the smaller area that represents the prob-ability of values more extreme than the z-score. Thisprobability value represents the area under the curveoutside our given z-score and is the chance (expectedfrequency) with which we would see a score of thatmagnitude or one that is even greater. Tables convert-ing z-values to p-values are found in all statistics texts(see Table A).

Page 6: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

478 Appendix A

A.3 Hypothesis Testing and StatisticalInference

A.3.1 The Confidence Interval

Statistical inference has to do with how we draw con-clusions about what populations are like based onsamples of data from experiments. This is the logic thatis used to determine whether our experimental vari-ables had a real effect or whether our results werelikely to be due to chance or unexplained randomvariation. Before we move on to this notion of statisti-cal decision making, a simpler example of inferencesabout populations, namely confidence intervals, will beillustrated.

One example of inference is in the estimation ofwhere the true population values are likely to occurbased on our sample. In other words, we can examinethe certainty with which our sample estimates will fallinside a range of values on the scale of measurement.For example, we might want to know the follow-ing information: Given the sample mean and standarddeviation, within what interval is the true or populationvalue likely to occur? For small samples, we use the t-statistic to help us (Student, 1908). The t-statistic is likeZ, but it describes the distribution of small experimentsbetter than the z-statistic that governs large popula-tions. Since most experiments are much smaller thanpopulations, and sometimes are a very small sampleindeed, the t-statistic is useful for much sensory evalu-ation work. Often we use the 95% confidence intervalto describe where the value of the mean is expectedto fall 95% of the time, given the information in oursample or experiment.

For a mean value M of N observations, the 95%confidence interval is given by

M ± t(

S/√

N)

(A.6)

where t is the t-value corresponding to N–1 degreesof freedom (explained below), that includes 2.5% ofexpected variation in the upper tail outside this valueand 2.5% in the lower tail (hence a two-tailed value,also explained below). Suppose we obtain a meanvalue of 5.0 on a 9-point scale, with a standard devia-tion of 1.0 in our sample, and there are 15 observations.The t-value for this experiment is based on 14 or n – 1degrees of freedom and is shown in Table B to be

2.145. So our best guess is that the true mean lies inthe range of 5 ± 2.145(1/

√15) or between 4.45 and

5.55. This could be useful, for example, if we wantedto insure that our product had a mean score of at least4.0 on this scale. We would be fairly confident, giventhe sample values from our experiment that it would infact exceed this value.

For continuous and normally distributed data, wecan similarly estimate a 95% confidence interval on themedian (Smith, 1988), given by

Med ± 1.253t(

S/√

N)

(A.7)

For larger samples, say N > 50, we can replace thet-value with its Z approximation, using Z = 1.96 inthese formulas for the 95% confidence interval. Asthe number of observations increases, the t-distributionbecomes closer to the normal distribution.

A.3.2 Hypothesis Testing

How can we tell if our experimental treatment had aneffect? First, we need to calculate means and standarddeviations. From these values we do further calcu-lations to come up with values called test statistics.These statistics, like the Z-score mentioned above,have known distributions, so we can tell how likely orunlikely the observations will be when chance varia-tion alone is operating. When chance variation aloneseems very unlikely (usually one chance in 20 orless), then we reject this notion and conclude that ourobservations must be due to our actual experimentaltreatment. This is the logic of statistical hypothesistesting. It is that simple.

Often we need a test to compare means. A use-ful statistic for small experiments is called Student’st-statistic. Student was the pseudonym of the origi-nal publisher of this statistic, a man named Gossetwho worked for the Guinness Brewery and did notwant other breweries to know that Guinness was usingstatistical methods (O’Mahony, 1986). By small exper-iments, we mean experiments with numbers of obser-vations per variable in the range of about 50 or less.Conceptually, the t-statistic is the difference betweenthe means divided by an estimate of the error or uncer-tainty around those means, called the standard error ofthe means.

Page 7: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix A 479

Imagine that we did our experiment many timesand each time calculated a mean value. These meansthemselves, then, could be plotted in a histogram andwould have a distribution of values. The standard errorof the mean is like the standard deviation of this sam-pling distribution of means. If you had lots of time andmoney, you could repeat the experiment over and overand estimate the population values from looking at thedistribution of sample mean scores. However, we donot usually do such a series of experiments, so we needa way to estimate this error. Fortunately, the error inour single experiment gives us a hint of how likely itis that our obtained mean is likely to reflect the popu-lation mean. That is, we can estimate that the limits ofconfidence are around the mean value we got. The lawsof statistics tell us that the standard error of the meanis simply the sample standard deviation divided by thesquare root of the number of observations (“N”). Thismakes sense in that the more observations we make,the more likely it is that our obtained mean actuallylies close to the true population mean.

In order to test whether the mean we see in ourexperiment is different from some other value, thereare three things we need to know: the mean itself, thesample standard deviation, and the number of obser-vations. An example of this form of the t-test is givenbelow, but first we need to take a closer look at thelogic of statistical testing.

The logical process of statistical inference is simi-lar for the t-tests and all other statistical tests. The onlydifference is that the t-statistic is computed for testingdifferences between two means, while other statisticsare used to test for differences among other values, likeproportions, standard deviations, or variances. In the t-test, we first assume that there is no difference betweenpopulation means. Another way to think about this isthat it implies that the experimental means were drawnfrom the same parent population. This is called the nullhypothesis. Next, we look at our t-value calculated inthe experiment and ask how likely this value wouldbe, given our assumption of no difference (i.e., a truenull hypothesis). Because we know the shape of the t-distribution, just like a Z-score, we can tell how far outin the tail our calculated t-statistic lies. From the areaunder the curve out in that tail, we can tell what percentof the time we could expect to see this value. If the t-value we calculate is very high and positive or verylow and negative, it is unlikely—a rare event given ourassumption. If this rarity passes some arbitrary cutoff

point, usually one chance in 20 (5%) or less, we con-clude that our initial assumption was probably wrong.Then we make a conclusion that the population meansare in fact different or that the sample means weredrawn from different parent populations. In practicalterms, this usually implies that our treatment variable(ingredients, processing, packaging, shelf life) did pro-duce a different sensory effect from some comparisonlevel or from our control product. We conclude that thedifference was not likely to happen from chance varia-tion alone. This is the logic of null hypothesis testing. Itis designed to keep us from making errors of conclud-ing that the experiment had an effect when there reallywas only a difference due to chance. Furthermore, itlimits our likelihood of making this mistake to a max-imum value of one chance in 20 in the long run (whencertain conditions are met, see postscript at the end ofthis chapter).

A.3.3 A Worked Example

Here is a worked example of a simple t-test. We doan experiment with the following scale, rating a newingredient formulation against a control for overallsweetness level:

much lesssweet

about thesame

much moresweet

We convert their box ratings to scores 1 (for the left-most box) through 7 (for the rightmost). The data fromten panelists are shown in Table A.2.

We now set up our null hypothesis and an alternativehypothesis different from the null. A common notationis to let the symbol Ho stand for the null hypothesis and

Table A.2 Data for t-test example

Panelist Rating

1 52 53 64 45 36 77 58 59 6

10 4

Page 8: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

480 Appendix A

Ha stand for the alternative. Several different alterna-tives are possible, so it takes some careful thought as towhich one to choose. This is discussed further below.The null hypothesis in this case is stated as an equa-tion concerning the population value, not our sample,as follows:

Ho: μ = 4.0. This is the null hypothesis.Ha: μ = 4.0 This is the alternative hypothesis.

Note that the Greek letter “mu” is used since theseare statements about population means, not samplemeans from our data. Also note that the alternativehypothesis is non- directional, since the populationmean could be higher or lower than our expected valueof 4.0. So the actual t-value after our calculations mightbe positive or negative. This is called a two-tailed test.If we were only interested in the alternative hypothe-sis (Ha) with a “greater than” or “less than” prediction,the test would be one tailed (and out critical t-valuewould change) as we would only examine one end ofthe t-distribution when checking for the probability andsignificance of the result.

For our test against a mean or fixed value, the t-testhas the following form:

t = M − μ

S/√

N(A.8)

where M is the sample mean, S is the standard devia-tion, N is the number of observations (judges or pan-elists, usually), and μ is the fixed value or populationmean.

Here are the calculations from the data setabove:

Mean = �X/N= 5.0�X = 50�X2 = 262(�X)2 = 2500

S =√

(262) − (2500)/10

9= 1.155

t = 5.0 − 4.0

1.155/√

10= 1

0.365= 2.740

So our obtained t-value for this experiment is 2.740.Next we need to know if this value is larger than what

we would expect by chance less than 5% of the time.Statistical tables for the t-distribution tell us that for asample size of 10 people (so degrees of freedom = 9),we expect a t-value of ±2.262 only 5% of the time. Thetwo-tailed test looks at both high and low tails and addsthem together since the test is non-directional, with thigh or low. So this critical value of +2.262 cuts off2.5% of the total area under the t-distribution in theupper half and −2.262 cuts off 2.5% in the lower half.Any values higher than 2.262 or lower than −2.262would be expected less than 5% of the time. In statis-tical talk, we say that the probability of our obtainedresult then is less than 0.05, since 2.738 > 2.262. Inother words, we obtained a t-value from our data thatis even more extreme than the cutoff value of 2.262.

So far all of this is some simple math, and thena cross-referencing of the obtained t-value to whatis predicted from the tabled t-values under the nullhypothesis. The next step is the inferential leap of sta-tistical decision making. Since the obtained t-value wasbigger in magnitude than the critical t-value, Ho isrejected and the alternative hypothesis is accepted. Inother words, our population mean is likely to be dif-ferent than the middle of our scale value of 4.0. We donot actually know how likely this is, but we know thatthe experiment would produce the sort of result we seeonly about 5% of the time when the null is true. Sowe infer that it is probably false. Looking back at thedata, this does not seem too unreasonable since sevenout of ten panelists scored higher than the null hypoth-esis value of 4.0. When we reject the null hypothesis,we claim that there is a statistically significant result.The use of the term “significance” is unfortunate, for insimple everyday English it means “important.” In sta-tistical terms significance only implies that a decisionhas been made and does not tell us whether the resultwas important or not. The steps in this chain of rea-soning, along with some decisions made early in theprocess about the alpha-level and power of the test, areshown in Fig. A.3.

A.3.4 A Few More Important Concepts

Before going ahead, there are some important conceptsin this process of statistical testing that need furtherexplanation. The first is degrees of freedom. When welook up our critical values for a statistic, the values are

Page 9: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix A 481

Formulate null andalternative hypotheses

Choose alpha level for Type I error

Choose sample sizeCalculate beta risk, power

(conduct experiment, gather data)

Calculate summary statisticsCentral tendency and variation

Calculate statisticsfor Hypothesis tests

Compare statistics to critical levelsor probability values to alpha

Decision time:reject null, withhold judgment

or accept null (depending on power)

Draw conclusions and make recommendations

STATISTICAL FLOWCHART

Fig. A.3 Steps in statistical decision making in an experi-ment. The items before the collection of the data concern theexperimental design and statistical conventions to be used inthe study. After the data are analyzed the inferential processbegins, first with data description, then computation of the teststatistic, and then comparison of the test statistic to the criti-cal value for our predetermined alpha-level and the size of theexperiment. If the computed test statistic is greater in magni-tude than the critical value, we reject the null hypothesis infavor of the alternative hypothesis. If the computed test statis-tic has a value smaller in magnitude than the critical value,we can make two choices. We can reserve judgment if thesample size is small or we can accept the null hypothesisif we are sure that the power and sensitivity of the test arehigh. A test of good power is in part determined by having asubstantial number of observations and test sensitivity is deter-mined by having good experimental procedures and controls (seeAppendix E).

frequently tabled not in terms of how many observa-tions were in our sample, but how many degrees offreedom we have. Degrees of freedom have to do withhow many parameters we are estimating from our datarelative to the number of observations. In essence, thisnotion asks how much the resulting values would befree to move, given the constraints we have from esti-mating other statistics. For example, when we estimatea mean, we have freedom for that value to move orchange until the last data point is collected. Anotherway to think about this is the following: If we knewall but one data point and already knew the mean, wewould not need that last data point. It would be deter-mined by all the other data points and the mean itself,so it has no freedom to change. We could calculatewhat it would have to be. In general, degrees of free-dom are equal to the sample size, minus one for eachof the parameters we are estimating. Most statisticsare tabled by their degrees of freedom. If we wantedto compare the means from two groups of N1 and N2

observations, we would have to calculate some param-eters like means for each group. So the total numbers ofdegrees of freedom are N1−1 + N2−1, or N1 + N2−2.

A second important consideration is whether ourstatistical test is a one- or a two-tailed test. Do wewish to test whether the mean is simply different fromsome value or whether it is larger or smaller thansome value? If the question is simply “different from”then we need to examine the probability that our teststatistic will fall into either the low or high tail of itsdistribution. As stated above in the example of the sim-ple t-test, if the question is directional, e.g., “greaterthan” some value, then we examine only one tail. Moststatistical tables have entries for one- and two-tailedtests. It is important, however, to think carefully aboutour underlying theoretical question. The choice of sta-tistical alternative hypotheses is related to the researchhypothesis. In some sensory tests, like paired prefer-ence, we do not have any way of predicting whichway the preference will go, and so the statistical testis two-tailed. This is in contrast to some discrimina-tion tests like the triangle procedure. In these tests wedo not expect performance below chance unless thereis something very wrong with the experiment. So thealternative hypothesis is that the true proportion cor-rect is greater than chance. The alternative is lookingin one direction and is therefore one-tailed.

A third important statistical concept to keep in mindis what type of distribution you are concerned with.

Page 10: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

482 Appendix A

There are three different kinds of distributions we havediscussed. First, there are overall population distribu-tions. They tell us what the world would look likeif we measured all possible values. This is usuallynot known, but we can make inferences about it fromour experiments. Second, we have sample distributionsderived from our actual data. What does our samplelook like? The data distribution can be pictured in agraph such as a histogram. Third, there are distribu-tions of test statistics. If the null hypothesis is true, howis the test statistic distributed over many experiments?How will the test statistic be affected by samplesof different sizes? What values would be expected,what variance due to chance alone? It is against theseexpected values that we examine our calculated valueand get some idea of its probability.

A.3.5 Decision Errors

Realizing that statistical decisions are based on prob-abilities, it is clear that some uncertainty is involved.Our test statistic may only happen 5% of the time undera true null hypothesis, but the null might still be true,even though we rejected it. So there is a chance thatour decision was a mistake and that we made an error.It is also possible sometimes that we fail to reject thenull, when a true difference exists. These two kinds ofmistakes are called Type I and Type II errors. A Type Ierror is committed when we reject the null hypothesiswhen it is actually true. In terms of a t-test comparisonof means, the Type I error implies that we concludedthat two population means are different when they arein fact the same, i.e., our data were in fact sampledfrom the same parent population. In other words, ourtreatment did not have an effect, but we mistakenlyconcluded that it did. The process of statistical testingis valuable, though, because it protects us from com-mitting this kind of error and going down blind alleysin terms of future research decisions, by limiting theproportion of times we could make these decisions.This upper limit on the risk of Type I error (over thelong term) is called alpha-risk.

As shown in Table A.3, another kind of error occurswhen we miss a difference that is real. This is calleda Type II error and is formally defined as a failure toreject the null hypothesis when the alternative hypoth-esis is actually true. Failures to detect a difference in

Table A.3 Statistical errors in decision making

Outcome of sensoryevaluation

Differencereported

No differencereported

True situation Products aredifferent

Correctdecision

Type II errorProb. isbeta-risk

Products arenotdifferent

Type I errorProb. isalpha-risk

Correctdecision

a t-test or more generally to fail to observe that anexperimental treatment had an effect can have impor-tant or even devastating business implications. Failingto note that a revised manufacturing process was infact an improvement would lose the potential benefitif the revision were not adopted as a new standard pro-cedure. Similarly, revised ingredients might be passedover when they in fact produce improvements in theproduct as perceived by consumers. Alternatively, badingredients might be accepted for use if the modi-fied product’s flaws are undetected. It is necessary tohave a sensitive enough test to protect against this kindof error. The long-term risk or probability of makingthis kind of mistake is called beta-risk, and one minusthe beta-risk is defined as the statistical power of thetest. The protection against Type II error by statisti-cal means and by experimental strategy is discussed inAppendix E.

A.4 Variations of the t-Test

There are three kinds of t-tests that are commonly used.One is a test of an experimental mean against a fixedvalue, like a population mean or a specific point ona scale like the middle of a just-right scale, as in theexample above. The second test is when observationsare paired, for example, when each panelist evaluatestwo products and the scores are associated since eachpair comes from a single person. This is called thepaired t-test or dependent t-test. The third type of t-testis performed when different groups of panelists eval-uate the two products. This is called the independentgroups t-test. The formulas for each test are simi-lar, in that they take the general form of a differencebetween means divided by the standard error. However,

Page 11: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix A 483

the actual computations are a bit different. The sec-tion below gives examples of the three comparisons ofmeans involving the t-statistic.

One type of t-test is the test against a populationmean or another fixed value, as we saw above in ourexample and Eq. (A.8). The second kind of t-test isthe test of paired observations also called the depen-dent t-test. This is a useful and powerful test design inwhich each panelist evaluates both products, allowingus to eliminate some of the inter-individual variation.To calculate this value of t, we first arrange the pairsof observations in two columns and subtract each onefrom the other member of the pair to create a differencescore. The difference scores then become the num-bers used in further calculations. The null hypothesisis that the mean of the difference scores is zero. Wealso need to calculate a standard deviation of these dif-ference scores, and a standard error by dividing thisstandard deviation by the square root of N, the numberof panelists

t = Mdiff

Sdiff/√

N(A.9)

where Mdiff is the mean of the difference scores andSdiff is the standard deviation of the difference scores.Here is an example of a t-test where each panelisttasted both products and we can perform a paired t-test.Products were rated on a 25-point scale for acceptance.Note that we compute a difference score (D) in thissituation, as shown in Table A.4.

Table A.4 Data for paired t-test example

Panelist Product A Product B Difference (Difference)2

1 20 22 2 42 18 19 1 13 19 17 −2 44 22 18 −4 165 17 21 4 166 20 23 3 97 19 19 0 08 16 20 4 169 21 22 1 1

10 19 20 1 1

Calculations:

sum of D = 10, mean of D = 1sum of D2 = 68standard deviation of D =

Sdiff =√∑N

i=1 D2i − ((�D)2

/N)

N − 1

=√

68 − (100/

10)

9= 2.539 ,

and t comes from

t = Mdiff

Sdiff/√

N= 1.0

2.5390/√

10= 1.25

This value does not exceed the tabled value for the5%, two-tailed limit on t (at 9 df), and so we concludethere is insufficient evidence for a difference. In otherwords, we do not reject the null hypothesis. The twosamples were rather close, compared to the level oferror among panelists.

The third type of t-test is conduced when there aredifferent groups of people, often called an independentgroups t-test. Sometimes the experimental constraintsmight dictate situations where we have two groups thattaste only one product each. Then a different formulafor the t-test applies. Now the data are no longer pairedor related in any way and a different calculation isneeded to estimate the standard error, since two groupswere involved and they have to be combined somehowto get a common estimate of the standard deviations.We also have some different degrees of freedom, nowgiven by the sum of the two group sizes minus 2 or(NGroup1 + NGroup2−2). The t-value is determined by

t = M1 − M2

SEpooled(A.10)

where M1 and M2 are the means of the two groups andSEpooled is the pooled standard error. For the indepen-dent t-test, the pooled error requires some work andgives an estimate of the error combining the error lev-els of the two groups. The pooled standard error fortwo groups, X and Y, is given by the following formula:

SEpooled =

√√√√[1/N1 + 1/N2

] [�x2 −(

(�x)2/

N1

)+ �y2 −

((�y)2

/N2

)](N1 + N2 − 2

)

(A.11)Here is a worked example of an independent group’s

t-test. In this case, we have two panels, one froma manufacturing site and one from a research site,both evaluating the perceived pepper heat from aningredient submitted for use in a highly spiced product.

Page 12: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

484 Appendix A

The product managers have become concerned that theplant QC panel may not be very sensitive to pepperheat due to their dietary consumption or other factors,and that the use of ingredients is getting out of linewith what research and development personnel feel isan appropriate level of pepper. So the sample is evalu-ated by both groups and an independent group’s t-testis performed. Our null hypothesis is that there is nodifference in the population means and our alternativehypothesis that the QC plant will have lower mean rat-ings in the long run (one-tailed situation). The dataset is comprised of pepper heat ratings on a 15-pointcategory scale as shown in Table A.5.

Table A.5 Data for independent group’s t-test

Manufacturing QC panel (X) R&D test panel (Y)

7 912 10

6 85 78 76 97 84 125 93

First, some preliminary calculations:

N1 = 10 �x = 63 Mean = 6.30 �x2 = 453 (�x)2=3969N2 = 9 �y = 79 Mean = 8.78 �y2 = 713 (�y)2=6291

Now we have all the information we need to calcu-late the value of

SEpooled =

√√√√(1/10 + 1/9)

[453 − 3969

10 + 713 − 62419

](10 + 9 − 2)

= 0.97

t = [(6.30–8.78)]/0.97 = −2.556.Degrees of freedom are 17 (= 10 + 9 − 2). The

critical t-value for a one-tailed test at 17 df is 1.740,so this is a statistically significant result. Our QC paneldoes seem to be giving lower scores for pepper heatthan the R&D panel.

Note that the variability is also a little higher in theQC panel. Our test formula assumes that the varianceis about equal. For highly unequal variability (1 SDmore than three times that of the other) some adjust-ments must be made. The problem of unequal variance

becomes more serious when the two groups are alsovery different in size. The t-distribution becomes apoor estimate of what to expect under a true null,so the alpha-level is no longer adequately protected.One approach is to adjust the degrees of freedomand formulas for this are given in advanced statis-tics books (e.g., Snedecor and Cochran, 1989). Thenon-pooled estimates of the t-value are provided bysome statistics packages and it is usually prudent toexamine these adjusted t-values if unequal group sizeand unequal variances happen to be the situation withyour data.

A.4.1 The Sensitivity of the Dependentt-Test for Sensory Data

In sensory testing, it is often valuable to have eachpanelist try all of the products in our test. For sim-ple paired tests of two products, this enables the useof the dependent t-test. This is especially valuablewhen the question is simply whether a modified pro-cess or ingredient has changed the sensory attributesof a product. The dependent t-test is preferable to theseparate-groups approach, where different people tryeach product. The reason is apparent from the calcula-tions. In the dependent t-test, the statistic is calculatedon a difference score. This means that the differencesamong panelists in overall sensory sensitivity or evenin their idiosyncratic scale usage are removed from thesituation. It is common to observe that some panelistshave a “favorite” part of the scale and may restrict theirresponses to one section of the allowable responses.However, with the dependent t-test, as long as panelistsrank order the products in the same way, there willbe a statistically significant result. This is one way topartition the variation due to subject differences fromthe variation due to other sources of error. In general,partitioning of error adds power to statistical tests, asshown in the section on repeated measures (or com-plete block) ANOVA (see Appendix C). Of course,there are some potential problems in having peopleevaluate both products, like sequential order effectsand possible fatigue and carry-over effects. However,the advantage gained in the sensitivity of the testusually far outweighs the liabilities of repeated testing.

Page 13: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix A 485

A.5 Summary: Statistical HypothesisTesting

Statistical testing is designed to prevent us from con-cluding that a treatment had an effect when none wasreally present and our differences were merely due tochance or the experimental error variation. Since thetest statistics like Z and t have known distributions,we can tell whether our results would be extreme, i.e.,in the tails of these distributions a certain percent ofthe time when only chance variation was operating.This allows us to reject the notion of chance varia-tion in favor of concluding that there was an actualeffect. The steps in statistical testing are summarizedin the flowchart shown in Fig. A.3. Sample size orthe number of observations to make in an experimentis one important decision. As noted above, this helpsdetermine the power and sensitivity of the test, as thestandard errors decrease as a function of the squareroot of N. Also note that this square root functionmeans that the advantage of increasing sample sizebecomes less as N gets larger. In other words thereis a law of diminishing returns. At some point thecost considerations in doing a large test will outweighthe advantage in reducing uncertainty and loweringrisk. An accomplished sensory professional will havea feeling for how well the sensitivity of the test bal-ances against the informational power and uncertaintyand risks involved and about how many people areenough to insure a sensitive test. These issues are dis-cussed further in the section on beta-risk and statisticalpower.

Note that statistical hypothesis testing by itself isa somewhat impoverished manner of performing sci-entific research. Rather than establishing theorems,laws, or general mathematical relationships about hownature works, we are simply making a binary yes/nodecision, either that a given experimental treatmenthad an effect or that it did not. Statistical tests can bethought of as a starting point or a kind of necessaryhurdle that is a part of experimentation in order to helprule out the effects of chance. However, it is not theend of the story, only the beginning. In addition to sta-tistical significance, the sensory scientist must alwaysdescribe the effects. It is easy for students to forget thispoint and report significance but fail to describe whathappened.

A.6 Postscript: What p-Values Signifyand What They Do Not

No single statistical concept is probably more oftenmisunderstood and so often abused than the obtainedp-value that we find for a statistic after conducting ananalysis. It is easy to forget that this p-value is basedon a hypothetical curve for the test statistic, like thet-distribution, that is calculated under the assumptionthat the null hypothesis is true. So the obtained p-valueis taken from the very situation that we are trying toreject or eliminate as a possibility. Once this fact isrealized, it is easier to put the p-value into proper per-spective and give it the due respect it deserves, but nomore.

What does the p-value mean? Let us reiterate. It isthe probability of observing a value of the test statistic(t, z, r, chi-square, or F-ratio) that is as large or largerthan the one we obtain in our experimental analysis,when the null hypothesis is true. That much, no moreand no less. In other words, assuming a true null, howlikely or unlikely would the obtained value of the t-test be? When it becomes somewhat unlikely, say itis expected less than 5% of the time, we reject thisnull in favor of the alternative hypothesis and concludethat there is statistical significance. Thus, we havegained some assurance of a relationship between ourexperimental treatments and the sensory variables weare measuring. Or have we? Here are some commonmisinterpretations of the p-value:

(1) The p-value (or more specifically, 1–p) representsthe odds-against chance. Absolutely false (Carver,1978). This puts the cart before the horse. Thechance of observing the t-statistic under a true nullis not the same as the chance of the null (or alterna-tive) being true given the observations. A p < 0.05does not mean there is only a 5% chance of thenull being true. In mathematical logic, the proba-bility of A given B is not necessarily the same as Bgiven A. If I find a dead man on my front lawn, thechance he was shot in the head is quite slim (lessthan 5%) at least in my neighborhood, but if I finda man shot through the head, he is more than likelydead (95% or more).

(2) A related way of misinterpreting the p-value isto say that the p-value represents the chance of

Page 14: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

486 Appendix A

making a Type I error. This is also not strictlytrue although it is widely assumed to be true(Pollard and Richardson, 1987). Indeed, that iswhat our alpha-cutoff is supposed to limit, in thelong run. But the actual value of alpha in restrict-ing our liability depends also upon the incidenceof true differences versus no differences in ourlong-term testing program. Only when the chanceof true difference is about 50% is the alpha-valuean accurate reflection of our liability in rejectinga true null. The incidence is usually not known,but can be estimated. In some cases like qualitycontrol or shelf-life testing, there are a lot more“no difference” situations, and then alpha wildlyunderestimates our chances of being correct, oncethe null is rejected. The following table shows howthis can occur. In this example, we have a 10%incidence of a true difference, alpha is set at 0.05and the beta-risk (chance of missing a difference)is 10%. For ease of computation, 1000 tests areconducted and the results are shown in Table A.6.

Table A.6 Incidence diagram

(True state) IncidenceDifferencefound

Difference notfound

Differenceexists

100 90 10 (at β= 10%)

Difference doesnot exist

900 45 (at α = 5%) 855

The chance of being correct, having decidedthere is a significant difference, is 90/(90+45) or2/3. There is a 1/3 chance (45/135) of being wrongonce you have rejected the null (not 5%!), eventhough we have done all our statistics correctlyand are sure we are running a good testing pro-gram! The problem in this scenario is the lowprobability before hand of actually being sentsomething worth testing. The notion of estimatingthe chance of being right or wrong given a cer-tain outcome is covered in the branch of statisticsknown as Bayesian statistics, after Bayes theoremwhich allows the kind of calculation illustrated inTable A.6 (see Berger and Berry, 1988).

A related mistake is made when we use thewords “confident” to describe our level of signifi-cance (1 – p or 1 – α). For example, a well-knownintroductory statistics text gives the followingincorrect information: “The probability of reject-ing Ho erroneously (committing a Type I error) is

known, it is equal to alpha. . . . Thus you may be95% confident that your decision to reject Ho iscorrect” (Welkowitz et al., 1982, p. 163) [italicsinserted]. This is absolutely untrue, as the aboveexample illustrates. With a low true incidence ofactual differences, the chance of being wrong onceHo is rejected may be very high indeed.

(3) One minus the p-value gives us the reliability ofour data, the faith we should have in the alterna-tive hypothesis, or is an index of the degree ofsupport for our research hypothesis in general. Allof these are false (Carver, 1978). Reliability, cer-tainly, is important (getting the same result uponrepeated testing) but is not estimated simply as1–p. A replicated experiment has much greaterscientific value than a low p-value, especially ifthe replication comes from another laboratory andanother test panel.

Interpreting p-values as evidence for the alternativehypothesis is just as wrong as interpreting them asaccurate measures of evidence against the null. Onceagain, it depends upon the incidence or prior probabil-ity. There is a misplaced feeling of elation that studentsand even some more mature researchers seem to getwhen we obtain a low p-value, as if this was an indica-tion of how good their experimental hypothesis was. Itis surprising then that journal editors continue to allowthe irrelevant convention of adding extra stars, aster-isks, or other symbols to indicate low p-values (∗∗0.01,∗∗∗0.001, etc.) beyond the pre-set alpha-level in exper-imental reports. The information given by these extrastars is minimal. They only tell you how likely theresult is under a true null, which you are deciding isfalse anyway.

Overzealous teachers and careless commentators havegiven the world the impression that our standard sta-tistical measures are inevitable, necessary, optimal andmathematically certain. In truth, statistics is a branch ofrhetoric and the use of any particular statistic . . . is noth-ing more than an exhortation to your fellow humans tosee meaning in data as you do. (Raskin, 1988, p. 432).

A.7 Statistical Glossary

Alpha-risk. The upper acceptable limit on committingType I errors (rejecting a true null hypothesis) setby the experimenter before the study, often at 5% orless.

Page 15: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix A 487

Beta-risk. The upper acceptable limit on committingType II errors (accepting a false null hypothesis).See Appendix VI.

Degrees of freedom. A value for the number of obser-vations that are unconstrained or free to vary onceour statistical observations are calculated from asample data set. In most cases the degrees of free-dom are given by the number of observations in thedata minus one.

Dependent variable. The variable that is free to movein a study, what is measured (such as ratings, num-bers of correct judgments, preference choices) toform the data set.

Distribution. A collection of values describing a dataset, a population, or a test statistic. The distribu-tion plots the values (usually on the horizontal axis)against their frequency or probability of occurrence(on the vertical axis).

Independent variable. The experimental variable ortreatment of interest that is manipulated by theexperimenter. A set of classes, conditions, or groupsthat are the subject of study.

Mean. A measure of central tendency. The arithmeticmean or average is the sum of all the observed valuesdivided by the number of observations. The geo-metric mean is the Nth root of the product of Nobservations.

Null hypothesis. An assumption about underlying pop-ulation values. In simple difference testing for scaleddata (e.g., where the t-test is used) the null assumesthat the population mean values for two treatmentsare equal. In simple difference testing on propor-tions (e.g., where the data represent a count ofcorrect judgments, as in the triangle test) the nullhypothesis is that the population proportion correctequals the chance probability. This is often mis-phrased as “there is no difference” (a conclusionfrom the experiment, not a null hypothesis).

One- and two-tailed tests. Describes the considerationof only one or two ends of a statistic’s distributionin determining the obtained p-value. In a one-tailedtest, the alternative hypothesis is directional (e.g.,the population mean of the test sample is greaterthan the control sample) while in a two-tailed test,the alternative hypothesis does not state a direction(e.g., the population mean of the test sample is notequal to the population mean of the control sample).

Parameter. A characteristic that is measured aboutsomething, such as the mean of a distribution.

P-value. The probability of observing a test statisticsas large or larger than the one calculated from anexperiment, when the null hypothesis is true. Usedas the basis for rejecting the null hypothesis whencompared to the pre-set alpha-level. Often mistak-enly assumed to be the probability of making anerror when the null hypothesis is rejected.

Sample size. The number of observations in our data,usually represented by the letter “N.”

Standard deviation. A measure of variability in a datasample or in a population.

Statistic. A value calculated from the data, with aknown distribution, based on certain assumptions.

Treatment. A word often used to describe two differentlevels of an experimental variable. In other words,what has been changed about a product and is thesubject of the test. See independent variable.

Type I error. Rejecting the null hypothesis when it istrue. In simple difference testing, the treatments arethought to be different when in fact they are thesame.

Type II error. Accepting the null hypothesis when itis false. In simple difference testing, the treatmentsare thought to be equal when in fact the populationvalues for those treatments are different.

References

Berger, J. O. and Berry, D. A. 1988. Statistical analysis and theillusion of objectivity. American Scientist, 76, 159–165.

Carver, R. P. 1978. The case against statistical significancetesting. Harvard Educational Review, 48, 378–399.

Gacula, M., Singh, J., Bi, J. and Altan, S. 2009. StatisticalMethods in Food and Consumer Research, Second Edition.Elsevier/Academic, Amsterdam.

O’Mahony, M. 1986. Sensory Evaluation of Food. StatisticalMethods and Procedures. Marcel Dekker, New York.

Piggott, J. R. 1986. Statistical Procedures in Food Research.Elsevier Applied Science, London.

Pollard, P. and Richardson, J. T. E. 1987. On the probability ofmaking Type I errors. Psychological Bulletin, 102, 159–163.

Raskin, J. 1988. Letter to the editor. American Scientist, 76, 432.Smith, G. L. 1988. Statistical analysis of sensory data. In: J. R.

Piggott (ed.), Sensory Analysis of Foods. Elsevier, London.Snedecor, G. W. and Cochran, W. G. 1989. Statistical Methods,

Eighth Edition. Iowa State University, Ames, IA.Student. 1908. The probable error of a mean. Biometrika, 6,

1–25.Welkowitz, J., Ewen, R. B. and Cohen, J. 1982. Introductory

Statistics for the Behavioral Sciences. Academic, New York.

Page 16: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and
Page 17: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B

Nonparametric and Binomial-Based Statistical Methods

Contents

B.1 Introduction to Nonparametric Tests . . . . . 489B.2 Binomial-Based Tests on Proportions . . . . . 490B.3 Chi-Square . . . . . . . . . . . . . . . . . 493

B.3.1 A Measure of Relatedness of TwoVariables . . . . . . . . . . . . . . . 493

B.3.2 Calculations . . . . . . . . . . . . . 494B.3.3 Related Samples: The McNemar Test . . 494B.3.4 The Stuart–Maxwell Test . . . . . . . 495B.3.5 Beta-Binomial, Chance-Corrected

Beta-Binomial, and DirichletMultinomial Analyses . . . . . . . . . 496

B.4 Useful Rank Order Tests . . . . . . . . . . . 499B.4.1 The Sign Test . . . . . . . . . . . . 499B.4.2 The Mann–Whitney U-Test . . . . . . 500B.4.3 Ranked Data with More Than Two

Samples, Friedman and Kramer Tests . . 501B.4.4 Rank Order Correlation . . . . . . . . 502

B.5 Conclusions . . . . . . . . . . . . . . . . . 503B.6 Postscript . . . . . . . . . . . . . . . . . . 503

B.6.1 Proof showing equivalence of binomialapproximation Z-test and χ2 test fordifference of proportions . . . . . . . 503

References . . . . . . . . . . . . . . . . . . . . 504

Although statistical tests provide the right tools forbasic psychophysical research, they are not ideallysuited for some of the tasks encountered in sensoryanalysis.

—M. O’Mahony (1986, p. 401)

Frequently, sensory evaluation data do not consist ofmeasurements on continuous variables, but rather arefrequency counts or proportions. The branch of statis-tics that deals with proportions and ranked data iscalled nonparametric statistics. This chapter illustratesstatistics used on proportions and ranks, with workedexamples.

B.1 Introduction to Nonparametric Tests

The t-test and other “parametric” statistics work wellfor situations in which the data are continuous aswith some rating scales. In other situations, how-ever, we categorize performance into right and wronganswers or we count the numbers who make a choiceof one product over another. Common examples ofthis kind of testing include the triangle test and thepaired preference tests. In these situations, we wantto use a kind of distribution for statistical testing thatis based on discrete, categorical data. One exampleis the binomial distribution and is described in thissection. The binomial distribution is useful for testsbased on proportions, where we have counted peoplein different categories. The binomial distribution is aspecial case where there are only two outcomes (e.g.,right and wrong answers in a triangle test). Sometimeswe may have more than two alternatives for classify-ing responses, in which case multinomial distributionstatistics apply. A commonly used statistic for compar-ing frequencies when there are two or more responsecategories is the chi-square statistic. For example, wemight want to know if the meals during which a foodproduct is consumed (say, breakfast, lunch, dinner,snacks) differed among teenagers and adults. If we

489

Page 18: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

490 Appendix B

asked consumers about the situation in which theymost commonly consumed the product, the data wouldconsist of counts of the frequencies for each group ofconsumers. The chi-square statistic could then be usedto compare these two frequency distributions. It willalso indicate whether there is any association betweenthe response categories (meals) and the age group orwhether these two variables are independent.

Some response alternatives have more than cate-gorical or nominal properties and represent rankingsof responses. For example, we might ask for rankedpreference of three or more variations of flavors for anew product. Consumers would rank them from mostappealing to least appealing, and we might want toknow whether there is any consistent trend, or whetherall flavors are about equally preferred across the group.For rank order data, there are a number of statisticaltechniques within the nonparametric toolbox.

Since it has been argued that many sensory mea-surements, even rating scales, do not have interval-level properties (see Chapter 7), it often makes senseto apply a nonparametric test, especially those basedon ranks, if the researcher has any doubts about thelevel of measurement inherent in the scaling data. Thenonparametric tests can also be used as a check onconclusions from the traditional tests. Since the non-parametric tests involve fewer assumptions than theirparametric alternatives, they are more “robust” andless likely to lead to erroneous conclusions or mis-estimation of the true alpha-risk when assumptionshave been violated. Furthermore, they are often quickand easy to calculate, so re-examination of the datadoes not entail a lot of extra work. Nonparametricmethods are also appropriate when the data devi-ate from a normal distribution, for example, with apattern of high or low outliers, marked asymmetryor skew.

When data are ranked or have ordinal-level proper-ties, a good measure of central tendency is the median.For data that are purely categorical (nominal level), themeasure of central tendency to report is the mode, themost frequent value. Various measures of dispersioncan be used as alternatives to the standard deviation.When the distribution of the data is not normally dis-tributed, the 95% confidence interval for the median ofN scores can be approximated by

N + 1

2± 0.98

√N (B.1)

When the data are reasonably normal, the confi-dence interval for the median is given by

Med ± 1.253t(S/√

N) (B.2)

where t is the two-tailed t-value for N–1 degrees offreedom (Smith, 1988). Another simple alternative isto state the semi-interquartile range or one-half the dif-ference between the data values from the 75th and 25thpercentiles.

There are several nonparametric versions ofthe correlation coefficient. One commonly used isthe rank order correlation attributed to Spearman.Nonparametric statistical tests with worked exam-ples are given in Siegel (1956), Hollander and Wolfe(1973), Conover (1980), and a book tailored for sen-sory evaluation by Rayner et al. (2005). It is advisablethat the sensory professional have some familiaritywith the common nonparametric tests so they can beused when the assumptions of the parametric testsseem doubtful. Many statistical computing packageswill also offer nonparametric modules and variouschoices of these tests. The sections below illustratesome of the common binomial, chi-square, and rankorder statistics, with some worked examples fromsensory applications.

B.2 Binomial-Based Tests on Proportions

The binomial distribution describes the frequenciesof events with discrete or categorical outcomes.Examples of such data in product testing would bethe proportion of people preferring one product overanother in a test or the proportion answering correctlyin a triangle test. The distribution is based on the bino-mial expansion, (p+q)n , where p is the probability ofone outcome, q is the probability of the other outcome(q = 1–p), and n is the number of samples or events.Under the null hypothesis in most discrimination tests,the value of p is determined by the number of alter-natives and so equals one-third in the triangle test andone-half in the duo–trio or paired comparison tests.

A classic and familiar example of binomial-basedoutcomes is in tossing a coin. Assuming it is a fair coinputs the expected probability at one-half for each out-come (heads or tails). Over many tosses (analogous tomany observations in a sensory test) we can predict the

Page 19: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B 491

likely or expected numbers of heads and tails and howoften these various possibilities are likely to occur. Topredict the number of each possibility, we “expand” thecombinations of (p+q)n letting p represent the numbersof heads and q the numbers of tails in n total throws asfollows:

For one throw, the values are (p+q)1 or p + q = 1.One head or one tail can occur and they will occur withprobability p = q = 1/2. The coefficients (multipliers)of each term divided by the total number of outcomesgives us the probability of each combination. For twothrows, the values are (p+q)2 so the expansion is p2 +2pq + q2 = 1 (note that the probabilities total 1). p2 isassociated with the outcome of two heads, and the mul-tiplicative rule of probabilities tells us that the chancesare (1/2) (1/2) = 1/4. 2pq represents one head and onetail (this can occur two ways) and the probability is1/2. In other words, there are four possible combina-tions and 2 of them include one head and one tail, sothe chance of this is 2/4 or 1/2. Similarly, q2 is asso-ciated with two tails and the probability is 1/4. Threethrows will yield the following outcomes: (p+q)3 = p3

+ 3p2q + 3pq2 + q3. This expansion tells us that there isa 1/8 chance of three heads or three tails, but there arethree ways to get two heads and one tail (HHT, HTH,THH) and similarly three ways to get one head and twotails, so the probability of these two outcomes is 3/8for each. Note that there are eight possible outcomesfor three throws or more generally 2n outcomes of nobservations (Fig. B.1).

As such an expansion continues with more events,the distribution of events, in terms of the possiblenumbers of one outcome, will form a bell-shaped dis-tribution (when p = q = 1/2), much like the normaldistribution bell curve. The coefficient for each termin the expansion is given by the formula for combi-nations, where an outcome appears A times out of Ntosses (that is A heads and N–A tails), as follows:

Coefficient = N!(N − A)!A!

This is the number of times the particular out-come can occur. When the coefficient is multiplied bypAqN−A we get the probability of that outcome. Thuswe can find an exact probability for any sample basedon the expansion. This is manageable for small sam-ples, but as the number of observations becomes large,

the binomial distribution begins to resemble the normaldistribution reasonably well and we can use a z-scoreapproximation to simplify our calculations. For smallsamples, we can actually do these calculations, butreference to a table can save time.

Here is an example of a small experiment (seeO’Mahony, 1986 for similar example). Ten people arepolled in a pilot test for an alternative formula of afood product. Eight prefer the new product over theold: two prefer the old product. What is the chancethat we would see a preference split of 8/10 or more, ifthe true probabilities were 1/2 (i.e., a 50/50 split in thepopulation)?

The binomial expansion for 10 observations, p = q= 1/2 is

p10 + 10p9q + 45 p8q2 + 120 p7q3... etc.

In order to see if there is an 8-to-2 split or larger weneed to calculate the proportions of times these out-comes can occur. Note that this includes the values inthe “tail” of the distribution, which includes the out-comes of a 9-to-1 and a 10-to-none preference split.So we only need the first three terms or (1/2)10 + 10(1/2)9(1/2) + 45 (1/2)8 (1/2)2.

This sums to about 0.055 or 5.5%. Thus, if thetrue split in the population as a whole was 50/50, wewould see a result this extreme (or more extreme)only about 5% of the time. Note that this is aboutwhere we reject the null hypothesis. But we have onlylooked at one tail of the distribution for this compu-tation. In a preference test we would normally notpredict at the outset that one item is preferred overanother. This requires a two-tailed test and so we needto double this value, giving a total probability of 11%.Remember, this is the exact probability of seeing an8–2 split or something more extreme (i.e., 9 to 1 or 10to zero).

For small experiments, we can sometimes godirectly to tables of the cumulative binomial distri-bution, which gives us exact probabilities for ouroutcomes based on the ends of the expansion equation.For larger experiments (N > 25 or so), we can use thenormal distribution approximation. The extremenessof a proportion can be represented by a z-score, ratherthan figuring all the probabilities and expansion terms.The disparity from what is expected by chance can beexpressed as the probability value associated with thatz-score. The formula for a binomial based z-score is

Page 20: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

492 Appendix B

Exp

ecte

d Fr

eque

ncy

0.1

0.2

0.3

0.4

0.5

1H 1T

0.1

0.2

0.3

0.4

0.5

2H 1H1T 2T

0.1

0.2

OUTCOMES(10 Tosses)

0.1

0.2

0.3

0.4

0.5

3H 2H1T 1H2T 3T

OUTCOMES

0.1

0.2

0.3

0.4

0.5

4H 3H1T 2H2T 1H3T 4T

OUTCOMES

Fig. B.1 The binomialexpansion is showngraphically for tossing a coinwith outcomes of heads andtails for various numbers ofthrows. (a) frequencies(probabilities) associated witha single toss. (b) frequenciesexpected from two tosses,(c) from three tosses, (d) fromfour tosses, and (e) from tentosses. Note that as thenumber of events (orobservations) increases, thedistribution begins to take onthe bell-shaped appearance ofthe normal distribution.

z = (Pobs − p) − (1/2 N)√pq/N

= (x − Np) − (0.5)√Npq

(B.3)

where Pobs is the proportion observed, p is the chanceprobability q = 1–p, N is the number of observations,and x is the number of those outcomes observed (Pobs=x/N).

The continuity correction accounts for the fact thatwe cannot have fractional observations and the distri-bution of the binomial outcomes is not really a con-tinuous measurement variable. In other words, thereare a limited number of whole number outcomes sincewe are counting discrete events (you cannot have halfa person prefer product A over product B). The con-tinuity correction accounts for this approximation byadjusting by the maximum amount of deviation from a

continuous variable in the counting process or one-halfof one observation.

The standard error of the proportion is estimated tobe the square root of p times q divided by the numberof observations (N). Note that as with the t-value, ourstandard error, or the uncertainly around our observa-tions, decreases as the reciprocal of the square root ofN. Our certainty that the observed proportion lies nearto the true population proportion increases as N getslarge.

Tables for minimum numbers correct, commonlyused for triangle tests, paired preference tests, etc.,solve this equation for X as a function of N and z(see Roessler et al., 1978). The tables show the min-imum number of people who have to get the testcorrect to reject the null hypothesis and then conclude

Page 21: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B 493

that a difference exists. For one-tailed discriminationtests at an alpha-risk of 0.05, the z-value is 1.645.In this case the minimum value can be solved fromthe inequalities where Eq. (B.3) is solved for X,and the equal sign is changed to “greater than” (Zmust exceed 1.645), and rounded up to the nearestwhole number since you cannot have a fraction of aperson.

Given the value of 1.645 for Z (at p = 0.05), and 1/3for p and 2/3 for q as in a triangle test, the inequalitycan be solved for X and N as follows:

X ≥ 2 N + 3

6+ 0.775

√N (B.4)

and for tests in which p is 1/2, the correspondingequation is

X ≥ N + 1

2+ 0.8225

√N (B.5)

We can also use these relationships to determineconfidence intervals on proportions. The 95% confi-dence interval on an observed proportion, Pobs (= X/N,where X is the number correct in a choice test) isequal to

Pobs ± Z√

pq/N (B.6)

where Z will take on the value of 1.96 for the two-tailed95% intervals for the normal distribution. This equa-tion would be useful for estimating the interval withinwhich a true proportion is likely to occur. The two-tailed situation is applicable to a paired preference testas shown in the following example. Suppose we test100 consumers and 60% show a preference for prod-uct A over product B. What is the confidence intervalaround the level of 60% preference for product A anddoes this interval overlap the null hypothesis value of50%? Using Eq. (B.6),

Pobs ± Z√

pq/N = 0.60 ± 1.96√

0.5(0.5/100) = 0.60 ± 0.098

In this case the lower limit is above 50%, so there isjust enough evidence to conclude that the true popula-tion proportion would not fall at 50%, given this result,95% of the time.

B.3 Chi-Square

B.3.1 A Measure of Relatedness of TwoVariables

The chi-square statistic is a useful statistic for com-paring frequencies of events classified in a table ofcategories. If each observation can be classified by twoor more variables, it enters into the frequency countfor a part of a matrix or classification table, whererows and columns represent the levels of each variable.For example, we might want to know whether there isany relationship between gender and consumption of anew reduced-fat product. Each person could be classi-fied as high- versus low-frequency users of the productand also as male or female. This would create a two-way table with four cells representing the counts ofpeople who fall into one of the four groups. For thesake of example, let us assume we had a 50/50 splitin sampling the two sexes and also an even propor-tion of our high- and low-frequency groups. Intuitively,we would expect 25% of observations to fall in eachcell of our table, assuming no difference between menand women in frequency of use. To the extent that oneor more cells in the table is disproportionally filledor lacking in observations, we would find evidence ofan association or lack of independence of gender andproduct use. Table B.1 shows two examples, one withno association between the variables and the other witha clear association (numbers represent counts of 200total participants).

Table B.1 Examples of different levels of association

No association Clear association

Usage group Usage group

Low High Low High (Total)

Males 50 50 75 25 (100)Females 50 50 20 80 (100)(Totals) (100) (100) (95) (105) (200)

In the left example, the within-cell entries for fre-quency counts are exactly what we would expect basedon the marginal totals, with one-half of the groups clas-sified according to each variable, we expect one-fourthof the total in each of the four cells (Of course, a resultof exactly 25% in each cell would rarely be found inreal life. In the right example, we see that females aremore inclined to fall into the high-usage group and

Page 22: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

494 Appendix B

that the reverse is true for males. So knowing genderhelps us predict something about the usage group, andconversely, knowing the usage, we can make a pre-diction about gender. So we conclude that there is arelationship between these two variables.

B.3.2 Calculations

More generally, the chi-square statistic is useful forcomparing distributions of data across two or morevariables. The general form of the statistic is to (1)compute the expected frequency minus the observedfrequency, (2) square this value, (3) divide by theexpected frequency, and (4) sum these values acrossall cells (Eq. (B.7)). The expected frequency is whatwould be predicted from random chance or from someknowledge or theory of what is a likely outcome basedon previous or current observations.

χ2 =∑ (observed − expected)2

expected(B.7)

The statistic has degrees of freedom equal to thenumber of rows minus one, times the number ofcolumns minus one. For a 2 × 2 table, the functionis mathematically equivalent to the z-formula from thebinomial probability in Eq. (B.3) (i.e.,χ2 = z2, seepostscript to this chapter for a proof). For small sam-ples, N < 50, we can also use a continuity correction,as with the Z-formula, where we subtract 1/2, so theYates correction for continuity gives us this equation:

χ2 Yates =∑ (|observed − expected| − 0.5)2

expected(B.8)

Note that the absolute value must be taken beforethe continuity correction is subtracted and that the sub-traction is before squaring (this is incorrect in sometexts).

A simple form of the test for 2 × 2 matrices and acomputational formula are shown in Fig. B.2.

Some care is needed in applying chi-square tests,as they are temptingly easy to perform and so widelyapplicable to questions of cross-classification and asso-ciation between variables. Tests using chi-square usu-ally assume that each observation is independent, e.g.,that each tally is a different person. It is not appropriate

A B

C D

G=A+B

H=C+D

E=A+C

F=B+D

N=A+B+C+D

2 = N(AD–BC)2

(E)(F)(G)(H)

Example : A, not–A test

Sample Presented

Response

A

not–A

A Not–A

30 15

10 25

40 40

45

35

80

2 = 11.43 =

80[(30)(25)–(10)(15)]2

(40)(40)(45)(35)

Sample data:

Fig. B.2 Some uses of the chi-square test for 2 × 2 contin-gency tables. The example shows the short cut formula appliedto the A, not-A test situation. The same analysis applies to thesame/different test. However, this is only appropriate if there aredifferent individuals in each cell, i.e., each tester only sees oneproduct. If the testers see both versions, then the McNemar testis appropriate instead of the simple chi-square.

for related-samples data such as repeated observationson the same person. The chi-square test is not robust ifthe frequency counts in any cells are too small, usu-ally defined as minimum count of five observations(expected) as rule of thumb. Many statistical tests arebased upon the chi-square distributions, as we will seein the section on rank order statistics.

B.3.3 Related Samples: The McNemar Test

The chi-square statistic is most often applied to inde-pendent observations classified on categorical vari-ables. However, many other statistical tests followa chi-square distribution as a test statistic. Repeatedobservations of a group on a simple dichotomousvariable (two classes) can be tested for change or

Page 23: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B 495

difference using the McNemar test for the significanceof changes. This is a simple test well suited to before-and-after experiments such as the effect of informationon attitude change. It can be applied to any situa-tion where test panelists view two products and theirresponses are categorized into two classes. For exam-ple, we might want to see if the number of peoplewho report liking a product changes after the presen-tation of some information such as nutritional content.Stone and Sidel (1993) give an example of using theMcNemar test for changes to assess whether just-rightscales show a difference between two products.

The general form of the test classifies responses in atwo-by-two matrix, with the same response categoriesas rows and columns. Since the test is designed toexamine changes or differences, the two cells with thesame values of row and column variables are ignored.It is only the other two corners of the table where theclassification differs that we are interested in. Table B.2gives example, with the frequency counts representedby the letters “a” through “d.”

Table B.2 Example for McNemar calculations

Before information is presented

Number likingthe product

Number dislikingor neutral

After information ispresented

Number liking a bNumber disliking or

neutralc d

The McNemar test calculates the following statistic:

χ2 = (|b − c| − 1)2

b + c(B.9)

Note that the absolute value of the difference istaken in the two cells where change occurs and thatthe other two cells (a and d) are ignored. The obtainedvalue must exceed the critical value of chi-square fordf = 1, which is 3.84 for a two-tailed test and 2.71 fora one-tailed test with a directional alternative hypothe-sis. It is important that the expected cell frequenciesbe larger than 5. Expected frequencies are given bythe sum of the two cells of interest, divided by two.Table B.3 gives an example testing for a change inpreference response following a taste test among 60consumers.

Table B.3 Sample data for McNemar test

Before tasting

Prefer product A Prefer product B

After tastingPrefer product A 12 33Prefer product B 8 7

And so our calculated value becomes:

χ2 = (|33 − 8| − 1)2

33 + 8= 576/41 = 14.05

This is larger than the critical value of 3.84, so wecan reject the null hypothesis (of no change in prefer-ence) and conclude that there was a change in prefer-ence favoring Product A, as suggested by the frequencycounts. Although there was a 2-to-1 preference for Bbefore tasting (marginal totals of 40 versus 20), 33 ofthose 40 people switched to product A while less thanhalf of those preferring product A beforehand switchedin the other direction. This disparity in changes drivesthe significance of the McNemar calculations andresult. This test is applicable to a variety of situations,such as the balanced A, not-A, and same/different testsin which each panelist judges both kinds of trials andthus the data are related observations. The generaliza-tion of the McNemar test to a situation with relatedobservations and multiple rows and columns is theStuart test for two products or the Cochran-Mantel-Haenzel test for more than two products.

B.3.4 The Stuart–Maxwell Test

A useful test for 3×3 matrices such as those generatedfrom just-about-right (JAR) scale data is the Stuart–Maxwell test discussed in Chapter 14. For example,we might have two products rated on JAR scales andwant to see if there is any difference in the distributionof ratings. The data are collapsed into three categories,those above just-right, those at or near just-right, andthose below the just-right point. Then the frequenciesin the off-diagonal cells are used to calculate a chi-square variable. The cells with identical classifications(the diagonal) are not used. The critical value is com-pared to a chi-square value for two degrees of freedom,which is 5.99. The calculations are shown in Fig. B.3and a worked example in Fig. B.4.

Page 24: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

496 Appendix B

Stuart Maxwell Calculations

A B C

D E F

G H I

P1=(D+B)/2

P3=(H+F)/2

P2=(G+C)/2

R1

R2

R3

(=A+B+C)

C1 C2 C3(=A+D+G)

ROWTOTALS:

COLUMN TOTALS:

4. Chi-square is calculated. Note that the cellaverages (P1, P2, P3) are multiplied by the squareddifferences (D1, D2, D3) of row and column totalsin which they DO NOT participate.

[(P1)(D3)2 + (P2)(D2)2

+ (P3)(D1)2]

2[(P1)(P2) + (P2)(P3) + (P1)(P3)]Chi-square =

1. Entries A thorugh I are the cell totals.

2. Average the off-diagonal pairs, P1 - P3

3. Find differences of row and column totals:

D1=C1–R1 D2=C2–R2 D3=C3–R3

Fig. B.3 The calculationsinvolved in theStuart–Maxwell test asapplied to just-about-right(JAR) scale data.

B.3.5 Beta-Binomial, Chance-CorrectedBeta-Binomial, and DirichletMultinomial Analyses

These three models can be used for replicated datafrom choice tasks. The beta-binomial is applicableto replicated tests where there are two outcomes(e.g., right and wrong answers) or two choices, asin a preference test. The Dirichlet multinomial isapplicable to tests where there are more than twochoices, such as a preference test with the no pref-erence option. The equations below describe how toconduct tests for overdispersion, when panelists orconsumers are not acting like random events (like

flipping coins) but rather show consistent patterns ofresponse. Worked examples are not shown here butcan be found in Gacula et al. (2009) and Bi (2006).Students are urged to look at those worked examplesbefore attempting these tests. Maximum likelihoodsolutions are also given in those texts, using S-plusprograms.

In all the examples below, the letters n, r, and mare used to refer to the number of panelists, replicates,and choices, respectively. We think these are easierto remember, but they are different than the notationsof Bi and Gacula who use n for replicates and k forpanelists (be forewarned). Lowercase x with subscriptswill refer to a single observation or count of choicesfor a given panelist and/or replicate.

Page 25: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B 497

35 12

10 10 8

2 5 5

P3=6.5

P2=7

60

28

12

25 50 25

ROWTOTALS:

COLUMN TOTALS:

Example for Just–About–Right Scales:

Product X

Too weak

Just right

Too Strong

Too weak

Just right

Too Strong

Pro

duct

Y

13

P1=(10+35)/2=22.5

This value is then compared to the critical value of chi–square for 2 df (=5.99).As 21.7>5.99, there was a significant difference in the ratings of the two products. Inspection of the 3X3 matrix suggests that Product Y is too weak relative to Product X.

(D1)2 = (60–25)2

= 352 = 1225

(D2)2 = (28–50)2

= –222 = 484

(D3)2 = (12–25)2

= –132 = 169

Chi–square = 22.5(169) + 7(484) + 6.5(1225)

2[22.5(7) + 22.5(6.5) + (6.5(7)]=21.7

Fig. B.4 A worked exampleof the Stuart–Maxwell test.

B.3.5.1 Beta-Binomial

The beta-binomial model assumes that the perfor-mance of panelists is distributed like a beta distribution(Bi, 2006). This distribution has two parameters, butthey can be summarized in a statistic called gamma.Gamma varies from zero to one and is a measureof the degree to which there are systematic patternsof response versus apparent random variation acrossreplications.

First we calculate a mean proportion and a varianceparameter, μ and S, respectively:

μ =∑n

i=1 xi/r

n(B.10)

where each xi is the number of correct judgmentssummed across replicates for that panelist. So μ is themean of the number of correct replicates. S is definedas:

S =n∑

i=1

(xi − μ)2 (B.11)

and then we can calculate our gamma:

γ = 1

r − 1

[rS

μ(1 − μ)n− 1

](B.12)

where r is the number of replicates, S is a measure ofdispersion, μ is the mean proportion correct for thegroup (looking at each person’s individual proportionsas shown below), and n is the number of judges.

Page 26: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

498 Appendix B

To test whether the beta-binomial or binomial is abetter fit, we use the following Z-test, sometimes calledTarone’s Z-test (Bi, 2006, p. 114):

Z = E − nr√2nr(r − 1)

(B.13)

where E is another measure of dispersion,

E =n∑

i=1

(xi − rm)2

m(1 − m)(B.14)

and m is the mean proportion correct

m =n∑

i=1

xi/nr (B.15)

If this Z is not significant, you have some justifica-tion for combining replicates and looking at the totalproportion correct over n x r trials.

If we wish to test our obtained value of μ againstsome null hypothesis value of μo, we can use anothersimple Z-test:

Z = |μ − μo|√Var(μ)

(B.16)

where Var(μ) is

Var(μ) = μ(1 − μ)

nr[(r − 1)γ + 1] (B.17)

Using this general equation, most significance testscan be done on any two-choice format such as forcedchoice or preference tests. The appealing factor is thatthe above equation has taken into account the overdis-persion in the data. That is, the issue of whether thereare segments of panelists performing consistently ver-sus apparent random performance from replicate toreplicate has been addressed.

B.3.5.2 The Chance-Corrected Beta-Binomial

Some authors have argued that the beta-binomial isunrealistic, because there is a lower limit on the popu-lation mean performance that is dictated by the chanceperformance level (see Bi, 2006, for a whole chapter onthis approach). This is intuitively appealing, although

some published comparisons of the two models showonly modest differences with real data sets.

Let p be the mean proportion correct, now we candefined a chance-corrected mean proportion as

μ = p − C

1 − C(B.18)

where C is the chance proportion correct, e.g., 1/3 forthe triangle test or the chance-expected preference ina two product test of 1/2. Now we need our varianceparameter, S,

S =n∑

i=1

(pi − μ)2 (B.19)

where pi is the proportion correct for panelist i. Nowwe need a new and slightly more complex estimate ofgamma as

γ = 1

(r − 1)(p − C)

[rS

n(1 − p)− p

](B.20)

The same Z-test still applies for testing again anull proportion, μo, but now we need a new variancecalculation, actually two of them:

Var(μ) = Var(p)

(1 − C)2(B.21)

and

Var(p) = (1 − C)2(1 − μ)

[(r − 1)μγ + C

1 − C+ μ

]/nr

(B.22)(whew! But now we have worked gamma back into

the picture).

B.3.5.3 The Dirichlet-Multinomial Model

This model extends the reasoning of the beta-binomialapproach to the situation where there are more than twoalternatives (Gacula et al., 2009). In its simplest form,it can be used to test against some fixed proportionslike an equal one-third split in a preferences test or a35/30/35% split if one uses the commonly observedno preference rate from identical samples of 30% (seeChapter 13).

Page 27: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B 499

Suppose we have three options: “prefer product A”,“no preference,” and “prefer product B.” Let X1 be thesum for product A over all choices, X2 be the sum ofno preference, and X3 the sum for product B. Let therebe n panelists, r replicates, and m choices (in this casethree). We have N total observations (= n × r). Thefirst thing we can do is try to see if there is a pat-tern of responding analogous to a nonzero gamma inTarone’s Z-test. This is yet another Z-statistic, given bythe following formula:

Z =N

m∑j=1

1/Xj

n∑i=1

xij(xij − 1) − [nr(r − 1)]

√2(m − 1)[nr(r − 1)]

(B.23)

where xij is the total number of that choice, j, forpanelist i, multiplied by xij–1, then summed acrossall panelists, then weighted by 1/Xj. Repeat for eachchoice, j.

We can also do a simple test against expected pro-portions, based on a weighted chi-square with 2df. Butfirst we need the heterogeneity parameter, C, which isanalogous to 1-gamma in the beta-binomial model. Letpj = Xj/N, where we just convert the total for eachchoice to the corresponding proportion:

C = r

(n − 1)(m − 1)

m∑j=1

1/pj

n∑j=1

(xij

m− pj)

2 (B.24)

Once we have our correction factor, C, for panelist“patterns” or overdispersion, we can perform a simpleχ2 test as follows:

χ2 = nr

C

m∑j=1

(pj − pexp)2

pexp(B.25)

where pj is again our observed proportion for eachchoice, and pexp is the proportion we expect basedon our theory. This is tested against a χ2 distributionwith m–1 degrees of freedom. A significant χ2 wouldindicate a deviation from our expected proportions.Such a test could also be applied to just-about-rightdata if we have some basis for assuming some rea-sonable or predicted distribution of results in the JARcategories.

B.4 Useful Rank Order Tests

B.4.1 The Sign Test

A simple nonparametric test of difference with paireddata is the sign test. The simplest case of ranking isthe paired comparison, when only two items are to beranked as to which is stronger or which is preferred.The sign test based on comparisons between two sam-ples is based on binomial statistics. The sign test canalso be used with any data such as scaled responses thathave at least ordinal properties. Obviously, in cases ofno difference, we expect the number of rankings in onedirection (for example, product A over B) to equal thenumber of rankings in the opposite direction (productB over A) so the null probability of 1/2 can be used.

In a two-sample case, when every panelist scoresboth products, the scores can be paired. Probabilitiescan be examined from the binomial tables, from thecritical value tables used for discrimination tests (forone-tailed hypotheses) with p = 1/2 or from the pairedpreference tables (for two tailed) (Roessler et al.,1978). The sign test is the nonparametric parallel ofthe dependent groups or paired t-test. Unlike the t-test,we do not need to fulfill the assumption of normallydistributed data. With skewed data, the t-test can bemisleading since high outliers will exert undue lever-age on the value of the mean. Since the sign test onlylooks for consistency in the direction of comparisons,the skew or outliers are not so influential. There arealso several nonparametric counterparts to the indepen-dent groups t-test. One of these, the Mann–WhitneyU-test, is shown below.

Table B.4 gives an example of the sign test. We sim-ply count the direction of paired scores and assume a

Table B.4 Data for sign test example

PanelistScore,Product A

Score,Product B Sign, for B > A

1 3 5 +2 7 9 +3 4 6 +4 5 3 –5 6 6 O6 8 7 −7 4 6 +8 3 7 +9 7 9 +

10 6 9 +

Page 28: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

500 Appendix B

50/50 split under the null hypothesis. In this example,panelists scored two products on a rating scale (everypanelist tasted both products) so the data are paired.Plus or minus “signs” are given to each pairing forwhether A is greater than B or B is greater than A,respectively, hence the name of the test. Ties are omit-ted, losing some statistical power, so the test works bestat detecting differences when there are not too manyties.

Count the number of +’s (= 7), and omit ties. Wecan then find the probability of (at least) 7/9 in atwo-tailed binomial probability table, which is 0.09.Although this is not enough evidence to reject the null,it might warrant further testing, as there seems to be aconsistent trend.

B.4.2 The Mann–Whitney U-Test

A parallel test to the independent groups t-test is theMann–Whitney U-test. It is almost as easy to calculateas the sign test and thus stands as a good alternativeto the independent groups t-test when the assumptionsof normal distributions and equal variance are doubt-ful. The test can be used for any situation in whichtwo groups of data are to be compared and the levelof measurement is at least ordinal. For example, twomanufacturing sites or production lines might send rep-resentative samples of soup to a sensory group forevaluation. Mean intensity scores for saltiness mightbe generated for each sample and then the two setsof scores would be compared. If no difference werepresent between the two sites, then rankings of thecombined scores would find the two sites to be inter-spersed. On the other hand, if one site was producingconsistently more salty soup than another, then that siteshould move toward higher rankings and the other sitetoward lower rankings. The U-test is sensitive to justsuch patterns of overlap versus separation in a set ofcombined ranks.

The first step is to rank the combined data and thenfind the sum of the ranks for the smaller of the twogroups. For a small experiment, with the larger ofthe two groups having less than 20 observations, thefollowing formula should be used:

U = n1n2 + [n1(n1 + 1)/2] + R1 (B.26)

where n1 is the smaller of the two samples, n2 is thelarger of the two samples, and R1 is the sum of theranks assigned to the smaller group. The next step isto test whether U has the correct form, since it maybe high or low depending upon the trends for the twogroups. The smaller of the two forms is desired. If Uis larger than n1n2/2, it is actually a value called U′and must be transformed to U by the formula, U =n1n2–U′.

Critical values for U are shown in Table E. Note thatthe obtained value for U must be equal to or smallerthan the tabled value in order to reject the null, asopposed to other tabled statistics where the obtainedvalue must exceed a tabled value . If the sample sizeis very large, with n2 greater than 20, the U statisticcan be converted to a z-score by the following formula,analogous to the difference between means divided bya standard deviation:

z = [U − (n1n2)/2]√[n1n2(n1 + n2 + 1)/12]

(B.27)

If there are ties in the data, the standard devia-tion(denominator) in the above formula needs adjust-ment as follows:

SD =√

[n1n2/(N(N − 1))][((N3 − N)/12) − �T](B.28)

where N = n1+n2 and T = (t3–t)/12 where t is thenumber of observations tied for a given rank. Thisdemands an extra housekeeping step where ties mustbe counted and the value for T computed and summedbefore Z can be found.

A worked example for a small sample is shown next.In our example of salty scores for soups, let us assumewe have the following panel means (Table B.5).

So there were 6 samples (= n2) taken from site Aand 5 (= n1) from site D. The ranking of the 11 scoreswould look like this (Table B.6).

Table B.5 Data for Mann–Whitney U-test

Site A Site D

4.7 8.23.5 6.64.3 4.15.2 5.54.2 4.42.7

Page 29: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B 501

Table B.6 Ranked data for Mann–Whitney U-test

Score Rank Site

8.2 1 D6.6 2 D5.5 3 D5.2 4 A4.7 5 A4.4 6 D4.3 7 A4.2 8 A4.1 9 D3.5 10 A2.7 11 A

R1 is then the sum of the ranks for site D (= 1 + 2 +3 + 6 + 9 = 21). Plugging into the formula we find thatU = 30 + 15–21 = 24.

Next, we check to make sure we have U and not U′(the smaller of the two is needed). Since U is largerthan n1n2/2 = 15, we did in fact obtain U′, so wesubtract U from 30 giving a value of 6. This is thencompared to the maximum critical value in Table E.For these sample sizes, the U value must be three orsmaller to reject the null at a two-tailed probability of0.05, so there is not enough evidence to reject the nullin this comparison. Inspection of the rankings showsa lot of overlap in the two sites, in spite of the gen-erally higher scores at site D. The independent groupst-test on these data also give a p value higher than 0.05,so there is agreement in this case. Siegel (1956) statesthat the Mann–Whitney test is about 95% as power-ful as the corresponding t-test. There are many othernonparametric tests for independent samples, but theMann–Whitney U-test is commonly used and simpleto calculate.

B.4.3 Ranked Data with More Than TwoSamples, Friedman and KramerTests

Two tests are commonly used in sensory evaluation forranked products where there are three or more itemsbeing compared. The Friedman “analysis of variance”on ranked data is a relatively powerful test that can beapplied to any data set where all products are viewedby all panelists, that is, there is a complete ranking byeach participant. The data set for the Friedman test thustakes the same form as a one-way analysis of variancewith products as columns and panelists as rows, except

that ranks are used instead of raw scores. It is alsoapplicable to any data set where the rows form a set ofmatched observations that can be converted to ranks.The Friedman test is very sensitive to a pattern of con-sistent rank orders. The calculated statistic is comparedto a chi-square value that depends upon the number ofproducts and the number of panelists. The second testthat is common in sensory work is Kramer’s rank sumtest. Critical values for significance for this test wererecalculated and published by Basker (1988) and byNewell and MacFarlane (1987) (see Table J). A varia-tion of the Friedman test is the rank test of Page (1963),which is a little more powerful than the Friedman test,but is only used when you are testing against one spe-cific predicted ranking order. Each of these methods isillustrated with an example below.

Example of the Friedman test: Twenty consumersare asked to rank three flavor submissions for theirappropriateness in a chocolate/malted milk drink. Wewould like to know if there is a significant overall dif-ference among the candidates as ranked. The Friedmantest constructs a chi-square statistic based on columntotals, Tj, in each of the J columns. For a matrix of Krows and J columns, we compared the obtained valueto a chi-square value of J–1 degrees of freedom. Hereis the general formula:

χ2 =⎧⎨⎩ 12

[K(J)(J + 1)]

⎡⎣ J∑

j=1

T2j

⎤⎦⎫⎬⎭− 3 K(J + 1)

(B.29)Table B.7 shows the data and column totals.So the calculations proceed as follows:

χ2 ={

12

[20(3)(4)][(43.5)2 + (46.5)2 + (30)2]

}− 3(20)(4) = 7.725

In the chi-square table for J–1 degrees of freedom,in this case df = 2, the critical value is 5.99. Becauseour obtained value of 7.7 exceeds this, we can rejectthe null. This makes sense since product C had apredominance of first rankings. Note that in order tocompare individual samples, we require another test.The sign test is appropriate, although if many pairsof samples are compared, then the alpha level needsto be reduced to compensate for the experiment-wiseincrease in risk. Another approach is to use the least-significant-difference (LSD) test for ranked data, asfollows:

LSD = 1.96

√K(J)(J + 1)

6(B.30)

Page 30: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

502 Appendix B

Table B.7 Data for Friedman test on ranks

Ranks

Panelist Product A Product B Product C

1 1 3 22 2 3 13 1 3 24 1 2 35 3 1 26 2 3 17 3 2 18 1 3 29 3 1 2

10 3 1 211 2 3 112 2 3 113 3 2 114 2 3 115 2.5 2.5 116 3 2 117 3 2 118 2 3 119 3 2 120 1 2 3Sum (column totals, Tj) 43.5 46.5 30

for J items ranked by K panelists. Items whose ranksums differ by more than this amount may be consid-ered significantly different.

An example of the (Kramer) Rank Sum test. Wecan also use the rank sum test directly on the previ-ous data set to compare products. We merely need thedifferences in the rank sums (column totals):

Differences: A versus B = 3.0B versus C = 16.5A versus C = 13.5

Comparing to the minimum critical differences atp < 0.05, (= 14.8, see J), there is a significant differ-ence between B and C, but not any other pair. Whatabout the comparison of A versus C, where the differ-ence was close to the critical value? A simple sign testbetween A and C would have yielded a 15 to 5 split,which is statistically significant (two-tailed p = 0.042).In this case when the rank sum test was so close to thecutoff value, it would be wise to examine the data withan additional test.

B.4.4 Rank Order Correlation

The common correlation coefficient, r, is alsoknown as the Pearson product–moment correlationcoefficient. It is a useful tool for estimating the degree

of linear association between two variables. However,it is very sensitive to outliers in the data. If the datado not achieve an interval scale of measurement, orhave a high degree of skew or outliers, the nonparamet-ric alternative given by Spearman’s formula should beconsidered. The Spearman rank order correlation wasone of the first to be developed (Siegel, 1956) and iscommonly signified by the Greek letter, ρ (rho). Thestatistic asks whether the two variables line up in simi-lar rankings. Tables of significance indicate whether anassociation exists based on these rankings.

The data must first be converted to ranks, and a dif-ference score calculated for each pair of ranks, similarto the way differences are computed in the paired t-test. These differences scores, d, are then squared andsummed. The formula for rho is as follows:

ρ = 6∑

d2

(N3 − N)(B.31)

Thus the value for rho is easy to calculate unlessthere are a high proportion of ties. If greater than one-fourth of the data are tied, an adjustment should bemade. The formula is very robust in the case of a fewties, with changes in rho usually only in the third dec-imal place. If there are many ties, a correction must becalculated for each tied case based on (t3–t)/12 where tis the number of items tied at a given rank. These val-ues are then summed for all the ties for each variable xand y, to give values Tx and Ty. rho is then calculatedas follows:

ρ =∑

x2 +∑ y2 −∑ d2

2√∑

x2∑

y2(B.32)

and∑x2 = [(N3 − N)/12] −

∑Tx and similarly∑

y2 = [(N3 − N)/12] −∑

Ty

(B.33)For example, if there are two cases for X in which

two items are tied and one case in which three are tied,the Tx becomes the sum:∑

Tx = (23 − 2)/12 + (23 − 2)/12 + (33 − 3)/12 = 3

and this quantity is then used as �Tx in Eq. (B.17).Suppose we wished to examine whether there was

a relationship between mean chewiness scores fora set of products evaluated by a texture panel andmean scores on scale for hardness. Perhaps we suspect

Page 31: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B 503

that the same underlying process variable gives riseto textural problems observable in both masticationand initial bite. Mean panel scores over ten productsmight look like Table B.8, with the calculation of rhofollowing:

Table B.8 Data and calculations for rank order correlation

Product Chewiness Rank Hardness Rank Difference D2

A 4.3 7 5.0 6 1 1B 5.6 8 6.1 8 0 0C 5.8 9 6.4 9 0 0D 3.2 4 4.4 4 0 0E 1.1 1 2.2 1 0 0F 8.2 10 9.5 10 0 0G 3.4 5 4.7 5 0 0H 2.2 3 3.4 2 1 1I 2.1 2 5.5 7 5 25J 3.7 6 4.3 3 3 9

The sum of the D2 values is 36, so rho computes tothe following:

1 − 6(36)/(1, 000 − 10) = 1 − 0.218 = 0.782

This is a moderately high degree of association, sig-nificant at the 0.01 level. This is obvious from the goodagreement in rankings, with the exception of product Iand J. Note that product F is a high outlier on bothscales. This inflates the Pearson correlation to 0.839,as it is sensitive to the leverage exerted by this pointthat lies away from the rest of the data set.

B.5 Conclusions

Some of the nonparametric parallels to common sta-tistical tests are shown in Table B.1. Further examplescan be found in statistical texts such as Siegel (1956).The nonparametric statistical tests are valuable to thesensory scientist for several reasons and it should bepart of a complete sensory training program to becomefamiliar with the most commonly used tests. Also, thebinomial distribution forms the basis for the choicetests commonly used in discrimination testing, so it isimportant to know how this distribution is derived andwhen it approximates normality. The chi-square statis-tics are useful for a wide range of problems involvingcategorical variables and as a nonparametric measureof association. They also form the basis for other sta-tistical tests such as the Friedman and McNemar tests.Nonparametric tests may be useful for scaled data

where the interval-level assumptions are in doubt orfor any data set when assumptions about normality ofthe data are questionable. In the case of deviations fromthe assumptions of a parametric test, confirmation witha nonparametric test may lend more credence to thesignificance of a result (Table B.9).

Table B.9 Parametric and nonparametric statistical tests

Purpose Parametric test Nonparametric parallel

Compare twoproducts(matched data)

Paired (dependent)t-test on means

Sign test

Compare twoproducts(separategroups)

Independent groupst-test on means

Mann–Whitney U-test

Comparemultipleproducts(completeblock design)

One-way analysis ofvariance withrepeated measures

Friedman test or ranksum test

Test associationof twovariables

Pearson(product–moment)correlationcoefficient

Spearman rank ordercorrelation

Nonparametric tests are performed on ranked data instead of rawnumbers.Other nonparametric tests are available for each purpose, thelisted ones are common.

B.6 Postscript

B.6.1 Proof showing equivalenceof binomial approximation Z-testand χ2 test for differenceof proportions

Recall that

χ2 =∑ (observed − expected)2

expected(B.34)

and

z = x/N − p√pq/N

(B.35)

where

X = number correct,N = total judgments or panelists,p = chance proportion,q = 1–p.

Page 32: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

504 Appendix B

Note that continuity corrections have been omittedfor simplicity.

Alternative Z-formula (multiply Eq. (B.35) by N/N)

z = x − Np√pqN

(B.36)

Although the χ2 distribution changes shape withdifferent df, the general relationship of the χ2 distri-bution to the Z-distribution is that χ2 at 1 df is a squareof Z. Note that critical χ2 at 1 df = 3.84 = 1.962 =Z0.95

2:

z2 = (x − Np)2

pqN(B.37)

and

z2 = x2 − 2xNp + N2p2

pqN(B.38)

The proof will now proceed to show the equivalenceof Eq. (B.38) toχ2.

Looking at any forced choice test, the χ2 approachrequires these frequency counts:

Correct judgments Incorrect

Observed X N–XExpected Np Nq

χ2 = (x − Np)2

Np+ [(N − x) − Nq]2

Nq(B.39)

Simplifying (N–X)–Nq to N(1–q)–Xthen since p = 1–q(N–X)–Nq = Np–X

Thus we can recast Eq. (B.39) as

χ2 = (x − Np)2

Np+ (Np − X)2

Nq(B.40)

and expanding the squared terms

χ2 = (x2 − 2xNp + N2p2)

Np+ (x2 − 2xNp + N2p2)

Nq(B.41)

To place them over a common denominator of Npq,we will multiple the left expression by q/q and the rightexpression by p/p giving

χ2 = (qx2 − 2xNpq + qN2p2)

Npq+ (px2 − 2xNpp + pN2p2)

Npq(B.42)

Collecting common terms

χ2 = [(q + p)x2 − (q + p)2xNp + (q + p)N2p2]/Npq(B.43)

Recall that q + p = 1, so Eq. (B.43) simplifies to

χ2 = [(1)x2 − (1)2xNp + (1)N2p2]/Npq (B.44)

and dropping the value 1 in each of the three terms inthe numerator gives Eq. (B.38), the formula for Z2:

z2 = x2 − 2xNp + N2p2

pqN= χ2

Recall that the continuity correction was omitted forsimplicity of the calculations. The equivalence holds ifand only if the continuity correction is either omittedfrom both analyses or included in both analyses. If itis omitted from one analysis but not the other, the onefrom which it is omitted will stand a better chance ofattaining significance.

References

Basker, D. 1988. Critical values of differences among ranksums for multiple comparisons. Food Technology, 42(2), 79,80–84.

Bi, J. 2006. Sensory Discrimination Tests and Measurements.Blackwell, Ames, IA.

Conover, W. J. 1980. Practical Nonparametric Statistics, SecondEdition. Wiley, New York.

Gacula, M., Singh, J., Bi, J. and Altan, S. 2009. StatisticalMethods in Food and Consumer Research, Second Edition.Elsevier/Academic, Amsterdam.

Hollander, M. and Wolfe, D. A. 1973. Nonparametric StatisticalMethods. Wiley, New York.

Newell, G. J. and MacFarlane, J. D. 1987. Expanded tables formultiple comparison procedures in the analysis of rankeddata. Journal of Food Science, 52, 1721–1725.

O’Mahony, M. 1986. Sensory Evaluation of Food. StatisticalMethods and Procedures. Marcel Dekker, New York.

Page, E. B. 1963. Ordered hypotheses for multiple treatments:A significance test for linear ranks. Journal of the AmericanStatistical Association, 58, 216–230.

Page 33: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix B 505

Rayner, J. C. W., Best, D. J., Brockhoff, P. B. and Rayner,G. D. 2005. Nonparametric s for Sensory Science: A moreInformative Approach. Blackwell, Ames IA.

Roessler, E. B., Pangborn, R. M., Sidel, J. L. and Stone,H. 1978. Expanded statistical tables estimating significancein paired-preference, paired-difference, duo-trio and triangletests. Journal of Food Science, 43, 940–943.

Siegel, S. 1956. Nonparametric Statistics for the BehavioralSciences. McGraw Hill, New York.

Smith, G. L. 1988. Statistical analysis of sensory data. In: J. R.Piggott (ed.), Sensory Analysis of Foods. Elsevier AppliedScience, London.

Stone, H. and Sidel, J. L. 1993. Sensory Evaluation Practices,Second Edition. Academic, San Diego.

Page 34: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and
Page 35: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C

Analysis of Variance

Contents

C.1 Introduction . . . . . . . . . . . . . . . . 507C.1.1 Overview . . . . . . . . . . . . . . 507C.1.2 Basic Analysis of Variance . . . . . . 508C.1.3 Rationale . . . . . . . . . . . . . . 508C.1.4 Calculations . . . . . . . . . . . . . 509C.1.5 A Worked Example . . . . . . . . . . 509

C.2 Analysis of Variance from Complete BlockDesigns . . . . . . . . . . . . . . . . . . . 510C.2.1 Concepts and Partitioning Panelist Variance

from Error . . . . . . . . . . . . . . 510C.2.2 The Value of Using Panelists As Their Own

Controls . . . . . . . . . . . . . . . 512C.3 Planned Comparisons Between Means Following

ANOVA . . . . . . . . . . . . . . . . . . . 513C.4 Multiple Factor Analysis of Variance . . . . . 514

C.4.1 An Example . . . . . . . . . . . . . 514C.4.2 Concept: A Linear Model . . . . . . . 515C.4.3 A Note About Interactions . . . . . . . 516

C.5 Panelist by Product by Replicate Designs . . . 516C.6 Issues and Concerns . . . . . . . . . . . . . 519

C.6.1 Sensory Panelists: Fixed or RandomEffects? . . . . . . . . . . . . . . . 519

C.6.2 A Note on Blocking . . . . . . . . . . 520C.6.3 Split-Plot or Between-Groups (Nested)

Designs . . . . . . . . . . . . . . . 520C.6.4 Statistical Assumptions and the Repeated

Measures ANOVA . . . . . . . . . . 521C.6.5 Other Options . . . . . . . . . . . . 522

References . . . . . . . . . . . . . . . . . . . . 522

For tests with more than two products and data thatconsist of attribute scale values, analysis of variancefollowed by planned comparisons of means is a com-mon and useful statistical method. Analysis of varianceand related tests are illustrated in this chapter, withworked examples.

C.1 Introduction

C.1.1 Overview

Analysis of variance is the most common statistical testperformed in descriptive analysis and many other sen-sory tests where more than two products are comparedusing scaled responses. It provides a very sensitive toolfor seeing whether treatment variables such as changesin ingredients, processes, or packaging had an effecton the sensory properties of products. It is a methodfor finding variation that can be attributed to somespecific cause, against the background of existing vari-ation due to other perhaps unknown or uncontrolledcauses. These other unexplained causes produce theexperimental error or noise in the data.

The following sections illustrate some of the basicideas in analysis of variance and provide some workedexamples. As this guide is meant for students andpractitioners, some theory and development of mod-els has been left out. However, the reader can referto the statistics texts such as Winer (1971), Hays(1973), O’Mahony (1986), and Gacula et al. (2009).A particularly useful book is the Analysis of Variancefor Sensory Data, by Lea et al. (1998), Lundahl andMcDaniel (1988). We have tried to use the samenomenclature as O’Mahony (1986) since that work is

507

Page 36: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

508 Appendix C

already familiar to many workers in sensory evaluationand of Winer (1971), a classic treatise on ANOVA forbehavioral data.

C.1.2 Basic Analysis of Variance

Analysis of variance is a way to examine differencesamong multiple treatments or levels and to compareseveral means at the same time. Some experimentshave many levels of an ingredient or process variable.Factors in ANOVA terminology mean independentvariables, the variables that you manipulate, i.e., thevariables under your direct control in an experiment.Analysis of variance estimates the variance (squareddeviations) attributable to each factor. This can bethought of as the degree to which each factor or vari-able moves the data away from the grand or overallmean of the data set. It also estimates the variance dueto error. Error can be thought of as other remainingvariation not attributable to the factors we manipulate.

In an analysis of variance we construct a ratio of thefactor variance to the error variance. This ratio followsthe distribution of an F-statistic. A significant F-ratiofor a given factor implies that at least one of the indi-vidual comparisons among means is significant for thatfactor. We use a model, in which there is some overallmean for the data and then variation around that value.The means from each of our treatment levels and theirdifferences from this grand mean represent a way tomeasure the effect of those treatments. However, wehave to view those differences in the light of the ran-dom variation that is present in our experiment. So,like the t- or z-statistic, the F-ratio is a ratio of signal-to-noise. In a simple two product experiment with onegroup of people testing each product, the F statistic issimply the square of the t-value, so there is an obviousrelationship between the F- and t-statistics.

The statistical distributions for F indicate whetherthe ratio we obtain in the experiment is one we wouldexpect only rarely by the operation of chance. Thus weapply the usual statistical reasoning when deciding toaccept or reject a null hypothesis. The null hypothesisfor ANOVA is usually that the means for the treat-ment levels would all be equal in the parent population.Analysis of variance is thus based on a model, a linearmodel, that says that any single data point or observa-tion is result of several influences—the grand mean,plus (or minus) whatever deviations are caused by

each treatment factor, plus the interactions of treatmentfactors, plus error.

C.1.3 Rationale

The worked example below will examine this in moredetail, but first a look at some of the rationale andderivation. The rationale proceeds as follows:

(a) We wish to know whether there are any significantdifferences among multiple means, relative to theerror in our experimental measures.

(b) To do this, we examine variance (squared standarddeviations).

(c) We look at the variance of our sample means fromthe overall (“grand”) mean of all of our data. Thisis sometimes called the variance due to “treat-ments.” Treatments are just the particular levels ofour independent variable.

(d) This variance is examined relative to the vari-ance within treatments, i.e. the unexplained erroror variability not attributed to the treatmentsthemselves.

The test is done by calculating a ratio. When thenull is true (no difference among product means) it isdistributed as an F-statistic. The F-distribution lookslike a t-distribution squared (and is in the same familyas the chi square distribution). Its exact shape changesand depends upon the number of degrees of freedomassociated with our treatments or products (the numer-ator of the ratio) and the degrees of freedom associatedwith our error (the denominator of the ratio).

Here is a mathematical derivation and a similar butmore detailed explanation can be found in O’Mahony(1986). Variance (the square of a standard deviation) isnoted by S2, x represents each score, and M is the meanof x scores or (�x)/N. Variance is the mean differenceof each score from the mean, given by

S2 =∑N

i=1 (Xi − M)2

N − 1(C.1)

and computationally by

S2 =∑N

i=1 X2i − (�X)

N

2

N − 1(C.2)

Page 37: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C 509

This expression can be thought of as the “meansquared deviation.” For our experimental treatments,we can speak of the means squared due to treatments,and for error, we can speak of the “mean squarederror.” The ratios of these two quantities give us theF-ratio, which we compare to the expected distributionof the F-statistic(under a true null). Note that in thecomputational formula for S2, we accumulate sums ofsquared observations. Sums of squares form the basisof the calculations in ANOVA.

To calculate the sums of squares, it is helpful tothink about partitioning the total variation.

Total variance is partitioned into variance betweentreatments and variance within treatments (or error).This can also be done for the sums of squares (SS):

SStotal = SSbetween + SSwithin (C.3)

This is useful since SSwithin (“error”) is tough tocalculate—it is like a pooled standard deviation overmany treatments. However, SStotal is easy! It is simplythe numerator of our overall variance or

SStotal =N∑

i=1

X2i − (�X)

N

2

over all x data points.

(C.4a)

So we usually estimate SSwithin (error) as SStotal

minus SSbetween. A mathematical proof of how theSS can be partitioned like this is found in O’Mahony(1986), appendix C, p. 379.

C.1.4 Calculations

Based on these ideas, here is the calculation in a sim-ple one-way ANOVA. “One-way” merely signifies thatthere is only one treatment variable or factor of inter-est. Remember, each factor may have multiple levels,which are usually the different versions of the prod-uct to be compared. In the following examples, wewill talk in terms of products and sensory judges orpanelists.

Let T = a total (It is useful to work in sums)let a = number of products (or treatments)let b = number of panelists per treatment.The product, ab = N.

SStotal =N∑

i=1

X2i − T

N

2(C.4b)

T without subscript is the grand total of all data orsimply �x, over all data points.

O’Mahony calls T2/N a “correction factor” or “C,”a useful convention:

SSbetween = (1/b)∑

T2a − T2/N (C.5)

where the “a” subscript refers to different products.Now we need the error sums of squares, which issimply from

SSwithin = SStotal − SSbetween (C.6)

The next step is to divide each SS by its associ-ated degrees of freedom to get our mean squares. Wehave mean squares associated with products and meansquares associated with error. In the final step, we usethe ratio of these two estimates of variance to form ourF-ratio.

C.1.5 A Worked Example

Our experimental question is did the treatment we usedon the products make any difference? In other words,are these means likely to represent real differences, orjust the effects of chance variation? The ANOVA willhelp address these questions. The sample data set isshown in Table C.1.

Table C.1 Data set for simple one-way ANOVA

Panelist Product A Product B Product C

1 6 8 92 6 7 83 7 10 124 5 5 55 6 5 76 5 6 97 7 7 88 4 6 89 7 6 5

10 8 8 8Totals 61 68 79Means 6.1 6.8 7.9

Page 38: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

510 Appendix C

First, column totals are calculated and a grand total,as well as the correction factor (“C”) and the sum ofthe squared data points.

Sums Ta = 61 Tb = 68 Tc = 79(Product A) (Product B) (Product C)

Grand T (sum of all data) = 208T2/N = (208)2/30 = 1442.13 (O’Mahony’s “C ” factor)�(x2) = 1530 (sum of all squared scores)

Given this information, the sums of squares can becalculated as follows:

SStotal =1530–1442.13=87.87SS due to treatments (“between”) = (Ta

2 + Tb2 +

Tc2)/ b – T2/N

(remember b is the number of panelists)= (612 + 682 + 792)/10–1442.13 = 16.47

Next, we need to find the degrees of freedom.The total degrees of freedom in the simple one-wayANOVA are the number of observation minus one(30–1=29). The degrees of freedom for the treatmentfactor are the number of levels minus one. The degreesof freedom for error are the total degrees of freedomminus the treatment (“between”) df.

df total = N–1 =29df for treatments = 3–1 = 2df for error = dftotal–dfbetween = 29–2 = 27

Finally, a “source table” is constructed to show thecalculations of the mean squares (our variance esti-mates) for each factor, and then to construct the F-ratio.The mean squares are the SS divided by the appropri-ate degrees of freedom (MS=SS/df). Table C.2 is thesource table.

Table C.2 Source table for first ANOVA

Source of variance SS df Mean squares F

Total 87.867 (29)Between 16.467 2 8.233 3.113Within (error) 71.4 27 2.644

A value of F = 3.119 at 2 and 27 degrees of freedomis just short of significance at p = 0.06. Most statisticalsoftware programs will now give an exact p-value forthe F-ratio and degrees of freedom. If the ANOVA is

done “by hand” then the F-ratio should be compared tothe critical value found in a table such as Table D. Wesee from this table that the critical value for 2 and 27 dfis about 3.35 (we are interpolating here between 2, 26and 2, 28 df), and our obtained value did not exceedthis critical value.

C.2 Analysis of Variance from CompleteBlock Designs

C.2.1 Concepts and Partitioning PanelistVariance from Error

The complete block analysis of variance for sensorydata occurs when all panelists view all products, orall levels of our treatment variable (Gacula and Singh,1984). This type of design is also called the “repeatedmeasures” analysis of variance in the behavioral sci-ences, when the experimental subject participates inall conditions (O’Mahony, 1986; Winer, 1971). Do notconfuse the statistical term, “repeated measures” withreplication. The design is analogous to the dependentor paired observations t-test, but considers multiplelevels of a variable, not just two. Like the dependentt-test, it has added sensitivity since the variation due topanelist differences can be partitioned from the analy-sis, in this case taken out of the error term. When theerror term is reduced, the F-ratio due to the treatmentor variable of interest will be larger, so it is “easier” tofind statistical significance. This is especially useful insensory evaluation, where panelists, even well trainedones, may use different parts of the scale or they maysimply have different sensitivities to the attribute beingevaluated. When all panelists rank order products thesame, the complete block ANOVA will usually pro-duce a significant difference between products, in spiteof panelists using different ranges of the scale.

The example below shows the kind of situationwhere a complete block analysis, like the dependentt-test, will have value in finding significant differ-ences. In this example, two ratings by two subjectsare shown in Fig. C.1. The differences between prod-ucts, also called “within subject differences” are inthe same direction and of the same magnitude. The“within-subject” effects in repeated measures termi-nology corresponds to between-treatment effects insimple ANOVA terminology. (This can be confusing.)

Page 39: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C 511

Panelist vs. product variation

Line scale ratings for two products, A and B, by two panelists, Judge 1 and Judge 2.

(Ratings indicated by this mark: / )

weak strong

/

weak strong

/

Product A

Product B

Judge 1

weak strong

/

weak strong

/

Product A

Product B

Judge 2

Differences between products(“within subjects”)

Differences between subjects (judges)

Fig. C.1 Two hypothetical panelists rating two products. Theyagree on the rank order of the products and the approximatesensory difference, but use different parts of the scale. The dif-ferences can be separated into the between-products (withinpanelist) differences and the difference between the panelists inthe overall part of the scale they have used. The dependent t-testseparates these two sources of difference by converting the rawscores to difference scores (between products) in the analysis.In a complete block design when panelist variation can be parti-tioned, the ANOVA provides a more sensitive comparison thanleaving the inter-individual difference in the error term.

In this example, the difference between panelists inthe part of the scale they use is quite large. In anyconventional analysis, such variation between peoplewould swamp the product effect by creating a largeerror term. However, the panelist differences can bepulled out of the error term in a complete block design,i.e. when every panelist evaluated all of the products inthe experiment.

To see the advantage of this analysis, we will showexamples with and without the partitioning of pan-elist variance. Here is a worked example, first withoutpartitioning panelist effects or as if there were threeindependent groups evaluating each product. This isa simple one-way ANOVA as shown above. Threeproducts are rated. They might differ in having threelevels of an ingredient. The sample data set is shownin Table C.3, with one small change from the firstexample of simple ANOVA. Note that panelist #10 has

Table C.3 Data set for the complete block design

Panelist Product A Product B Product C

1 6 8 92 6 7 83 7 10 124 5 5 55 6 5 76 5 6 97 7 7 88 4 6 89 7 6 5

10 1 2 3Totals 54 62 74Means 5.4 6.2 7.4

now produced the values 1, 2, and 3 instead of 8, 8, and8. The panelist is no longer a non-discriminator, but isprobably insensitive.

Here is how the one-way ANOVA would look:

Sums Ta = 54 Tb = 62 Tc = 74

Grand T (sum of all data points)= 190T2/N = (190)2/30 = 1203.3 (O’Mahony’s “C” factor)�(x2) = 1352SStotal =1352–1203.3=148.7SS due to products = (Ta

2 +Tb2 + Tc

2)/ b – T2/N(remember b here refers to the number of panelists)= (542 + 622 + 742)/10–1203.3 = 20.3SSerror = SSTotal–SSProducts = 148.7–20.3 = 128.4

Table C.4 shows the source table.

Table C.4 Source table for complete block ANOVA

Source of variance SS df Mean squares F

Total 148.67 (29)Between 20.26 2 10.13 2.13Within (error) 128.4 27 4.76

For 2 and 27 degrees of freedom, this F gives usa p = 0.14 (p > 0.05, not significant). The criticalF-ratio for 2 and 27 degrees of freedom is about 3.35(interpolated from values in Table D).

Now, here is the difference in the complete blockANOVA. An additional computation requires rowsums and sums of squares for the row variable, whichis our panelist effect as shown in Table C.3. In the one-way analysis, the data set was analyzed as if there were30 different people contributing the ratings. Actually,there were ten panelists who viewed all products. This

Page 40: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

512 Appendix C

Table C.5 Data set for the complete block design showing panelist calculations (rows)

Panelist Product A Product B Product C �panelist (�panelist)2

1 6 8 9 23 5292 6 7 8 21 4413 7 10 12 19 8414 5 5 5 15 2255 6 5 7 18 3246 5 6 9 20 4007 7 7 8 22 4848 4 6 8 18 3249 7 6 5 18 324

10 8 8 8 6 36Totals 61 68 79 3, 928

fits the requirement for a complete block design. Wecan thus further partition the error term into an effectdue to panelists (“between-subjects” effect) and resid-ual error. To do this, we need to estimate the effect dueto inter-panelist differences. Take the sum across rows(to get panelist sums), then square them. Sum againdown the new column as shown in Table C.5. The pan-elist sum of squares is analogous to the product sum ofsquares, but now we are working across rows insteadof down the columns:

SSpanelists =∑(∑

panelist)2

/3−C = 3928/3 − 1203.3 = 106

“C ”, once again is the “correction factor” or thegrand total squared, divided by the number of obser-vations. In making this calculation, we have used ninemore degrees of freedom from the total, so these are nolonger available to our estimate of error df below.

A new sum of squares for residual error can now becalculated:

SSerror = SStotal − SSproducts − SSpanelists = 148.7 − 20.3 − 106 = 22.36

and the mean square for error (MS error) isSSerror/18 = 22.36/18 = 1.24

Note that there are now only 18 degrees of free-dom left for the error since we took another nine toestimate the panelists’ variance. However, the meansquare error has shrunk from 4.76 to 1.24. Finally, anew F-ratio for the product effect (“within subjects”)having removed the between-subjects effect from theerror term as shown in our source table, Table C.6.

So the new F = MSproducts/MSerror = 10.15/1.24= 8.17

At 2 and 18 degrees of freedom, this is significant atp = 0.003, and it is now bigger than the critical F for 2and 18 degrees of freedom (Fcrit = 3.55)

Table C.6 Source table for two-way ANOVA

Source of variance SS df Mean squares F

Total 148.7 (29)Products 20.3 2 10.13 8.14Panelists 106 9Error 22.4 18 1.24

Why was this significant when panelist variance waspartitioned, but not in the usual one-way ANOVA?The answer lies in the systematic variation due to pan-elists’ scale use and the ability of the two-way ANOVAto remove this effect from the error term. Makingerror smaller is a general goal of just about every sen-sory study, and here we see a powerful way to dothis mathematically, by using a specific experimentaldesign.

C.2.2 The Value of Using PanelistsAs Their Own Controls

The data set in the complete block example was quitesimilar to the data set used in the one-way ANOVAillustrated first. The only change was in panelist #10,who rated the products all as an 8 in the first example.In the second example, this non-discriminating panelistwas removed and data were substituted from an insen-sitive panelist, but one with correct rank ordering. Thispanelist rated the products 1, 2, and 3 following thegeneral trend of the rest of the panel, but on an overalllower level of the scale.

Notice the effect of substituting a person who is anoutlier on the scale but who discriminates the prod-ucts in the proper rank order. Because these values are

Page 41: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C 513

quite low, they add more to the overall variance than tothe product differences, so the one-way ANOVA goesfrom nearly significant (p = 0.06) to much less evi-dence against the null (p = 0.14). In other words, thepanelist who did not differentiate the products, but whosat in the middle of the data set was not very harmfulto the one-way ANOVA, but the panelist with overalllow values contributes to error, even though he or shediscriminated among the products. Since the completeblock design allows us to partition out overall panelistdifferences, and focus just on product differences, thefact that he or she was a low rater does not hurt this typeof analysis. The F-ratio for products is now significant(p = 0.003). In general, the panelists are monotoni-cally increasing, with the exceptions of #4, 5, and 9(dotted lines) as shown in Fig. C.2. The panelist withlow ratings follows the majority trend and thus helpsthe situation.

A B C0

2

4

6

8

10

12

14

Panelist 1Panelist 2Panelist 3Panelist 4Panelist 5

Panelist 6Panelist 7Panelist 8Panelist 9Panelist 10

Panelist trends

Product

Ra

tin

g

Fig. C.2 Panelist trends in the complete block example. Notethat panelist #10 has rank ordered the products in same way asthe panel means, but is a low outlier. This panelist is problematicfor the one-way ANOVA but less so for when the panelist effectsare partitioned as in the repeated measures models.

The same statistically significant result is obtainedin a Friedman “analysis of variance” on ranks (see

Appendix B). A potential insight here is the following:having a complete design allows repeated measuresANOVA. This allows us to “get rid of” panelist dif-ferences in scale usage, sensory sensitivity, anosmia,etc., and focus on product trends. Since humans arenotoriously hard to calibrate, this is highly valuable insensory work.

C.3 Planned Comparisons BetweenMeans Following ANOVA

Finding a significant F-ratio in ANOVA is only onestep in statistical analysis of experiments with morethan two products. It is also necessary to comparetreatment means to see which pairs were different.A number of techniques are available to do this,most based on variations on the t-test. The rationaleis to avoid inflated risk of Type I error that wouldbe inherent in making comparisons just by repeat-ing t-tests. For example, the Duncan test attempts tomaintain “experiment-wise” alpha at 0.05. In otherwords, across the entire set of paired comparisons ofthe product means, we would like to keep alpha-risk ata maximum of 5%. Since risk is a function of numberof tests, the critical value of the t-statistic is adjusted tomaintain risk at an acceptable level.

Different approaches exist, differing in assumptionsand degree of “liberality” in amount of evidenceneeded to reject the null. Common types includethe tests called Scheffé, Tukey, or HSD (honestly-significant-difference), Newman-Keuls, Duncans,LSD (“least-significant-difference”). The Scheffétest is most conservative and the LSD test the least(for examples see Winer, 1971, pp. 200–201). TheDuncan procedure guards against Type I error amonga set of comparisons, as long as there is already asignificant F-ratio found in the ANOVA. This is agood compromise test to use for sensory data. TheLSD test and the Duncan test are illustrated below.

The least significant difference, or LSD test isquite popular, since you simply compute the differ-ence between means required for significance, basedon your error term from the ANOVA. The error termis a pooled estimate of error considering all your treat-ments together. However, the LSD test does little toprotect you from making too many comparisons, sincethe critical values do not increase with the numbers of

Page 42: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

514 Appendix C

comparisons you make, as is the case with some of theother statistics such as Duncan and Tukey (HSD) tests:

LSD = t

√2MSerror

N(C.7)

where n is the number of panelists in a one-wayANOVA or one factor repeated measures, and t is thet-value for a two-tailed test with the degrees of freedomfor the error term. The difference between the meansmust be larger than the LSD.

Calculations for the Duncan multiple range testuse a “studentized range statistic”, usually abbreviatedwith a lower case “q”. The general formula for com-paring pairs of individual means is to find the quantityto the right of this inequality and compare it to q:

qp ≤ Mean1 − Mean2√2MSerror

N

(C.8)

The calculated value must exceed a tabled value ofqp, which is based on the number of means separat-ing the two we wish to compare, when all the meansare rank ordered. MSerror is the error term associatedwith that factor in the ANOVA from which the meansoriginate, n is the number of observations contributingto each mean, and qp is the studentized range statisticfrom Duncan’s tables (see table G). The subscript, p,indicates the number of means between the two we arecomparing (including themselves), when they are rankordered. If we had three means, we would use the valuefor p of 2 for comparing adjacent means when rankedand for p of 3 for comparing the highest and lowestmeans. The degrees of freedom are n–1. Note that thevalues for q are similar to but slightly greater than thecorresponding t-values.

The general steps proceed as follows:

1. Conduct ANOVA and find MS error term2. Rank order the means3. Find q values for each p (number of means between,

plus 2) and n–1 df.4. Compare q to the formula in Eq. C.8 or5. Find critical differences than must be exceeded by

the values of

Difference ≥ qp

√2MSerror

N

Note that this is just like the LSD test, but uses qinstead of t. These critical differences are useful whenyou have lots of means to compare.

Here is a sample problem. From a simple one-way ANOVA on four observations, the means wereas follows: Treatment A = 9, Treatment B = 8, andTreatment C = 5.75. The MSerror = 0.375. If we com-pare treatments A and C, the quantity to exceed qbecomes

(9 − 5.75)√2(0.375)/4

= 3.25/0.433 = 7.5

The critical value of q, for p = 3, alpha = 0.05 is4.516, so we can conclude that treatments A and Cwere significantly different.

An alternative computation is to find a critical dif-ference by multiplying our value of q times the denom-inator (our error term) to find the difference betweenthe means that must be exceeded for significance. Thisis sometimes easier to tabulate if you are comparinga number of means “by hand”. In the above example,using the steps above for finding a critical difference,we multiply q (or 4.516) by the denominator term forthe pooled standard error (0.433), giving a critical dif-ference of 1.955. Since 9–5.75 (= 3.25) exceeds thecritical difference of 1.955, we can conclude that thesetwo samples were different.

C.4 Multiple Factor Analysis of Variance

C.4.1 An Example

In many experiments, we will have more than one vari-able of interest, for example, two or more ingredientsor two or more processing changes. The applicable sta-tistical tool for analysis of scaled data where we havetwo or more independent variables (called factors) isthe multiple factor analysis of variance. These arecalled two-way ANOVAs for two variables, three-wayfor three variables, and so on.

Here is a simple sample problem and the data set isshown in Table C.7. We have two sweeteners, sucroseand high-fructose corn syrup (HFCS), being blendedin a food (say a breakfast cereal), and we would like tounderstand the impact of each on the sweetness of theproduct. We vary the amount of each sweetener added

Page 43: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C 515

Table C.7 Data set for a two-factor analysis of variance (entriessuch as 1,1,2,4 represent four data points)

Factor 1: Level of sucrose

Level 1 Level 2 Level 3

Level of HFCS 2% 4% 6%Level A (2%) 1,1,2,4 3,5,5,5 6,4,6,7Level B (4%) 2,3,4,5 4,6,7,5 6,8,8,9Level C (6%) 5,6,7,8 7,8,8,6 8,8,9,7

to the product (2, 4 and 6% of each) and have a panel offour individuals rate the product for sweetness. (Fourpanelists are probably too few for most experimentsbut this example is simplified for the sake of clarity.)We use three levels of each sweetener, in a factorialdesign. A factorial design means that each level ofone factor is combined with every level of the otherfactor.

We would like to know whether these levels ofsucrose had any effect, whether the levels of HFCS hadany effect and whether the two sweeteners in combina-tion produced any result that would not be predictedfrom the average response to each sweetener. This lastitem we call an interaction (more on this below).

First, let us look at the cell means, and marginalmeans, shown in Table C.8:

Table C.8 Means for two factor experiment

Factor (variable) 1

Factor 2 Level 1 Level 2 Level 3 Row mean

Level A 2.0 4.5 5.75 4.08Level B 3.5 5.5 7.75 5.58Level C 6.5 7.25 8.0 7.25Column mean 4.0 5.75 7.17 5.63 (Grand mean)

Next let us look at some graphs of these means tosee what happened. Figure C.3 shows the trends in thedata.

C.4.2 Concept: A Linear Model

Here is what happened in the analysis of the previousdata set: The ANOVA will test hypotheses from a gen-eral linear model. This model states that any score inthe data set is determined by a number of factors:

Score = Grand mean + Factor 1 effect + Factor 2 effect+ Interaction effect + Error.

1

3

5

7

9

Mea

n s

wee

tnes

s ra

tin

g

Level A Level B Level C

HCFS Level

Summary of Mean Effects

Level 3

Level 2

Level 1

Sucrose Level

Fig. C.3 Means from the two-factor sweetener experiment.

In plain English, there is an overall tendency forthese products to be sweet that is estimated by thegrand mean. For each data point, there is some pertur-bation from that mean due to the first factor, some dueto the second factor, some due to the particular waysin which the two factors interact or combine, and somerandom error process. The means for the null hypoth-esis are the population means we would expect fromeach treatment, averaged across all the other factorsthat are present. These can be thought of as “marginalmeans” since they are estimated by the row and col-umn totals (we would often see them in the margins ofa data matrix as calculations proceeded).

For the effects of our two sweeteners, we are testingwhether the marginal means are likely to be equal (inthe underlying population), or whether there was somesystematic differences among them, and whether thisvariance was large relative to error, in fact so large thatwe would rarely expect such variation under a true nullhypothesis.

The ANOVA uses an F-ratio to compare the effectvariance to our sample error variance. The exact cal-culations for this ANOVA are not presented here, butthey are performed in the same way as the two-factorcomplete design ANOVA (sometimes called “repeatedmeasures” (Winer, 1971)) that is illustrated in a latersection.

The output of our ANOVA will be presented in atable like Table C.9.

We then determine significance by looking up thecritical F-ratio for our numerator and denominatordegrees of freedom. If the obtained F is greater than thetabulated F, we reject the null hypothesis and conclude

Page 44: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

516 Appendix C

Table C.9 Source table for two-way ANOVA with interaction

Effect Sums of squares df MS F

Factor 1 60.22 2 30.11 25.46Error 7.11 6 1.19Factor 2 60.39 2 30.20 16.55Error 10.94 6 1.82Interaction 9.44 4 2.36 5.24Error 5.22 12 0.43

that our factor had some effect. The critical F-valuesfor these comparisons are 5.14 for the sweetener fac-tors (2 and 6 df, both significant) and 3.26 (for 4 and12 df) for the interaction effect (see Table D). Theinteraction arises since the higher level of sweetener2 in Fig. C.3 shows a flatter slope than the slopes atthe lower levels. So there is some saturation or flatten-ing out of response, a common finding at high sensoryintensities.

C.4.3 A Note About Interactions

What is an interaction? Unfortunately, the word hasboth a common meaning and a statistical meaning.The common meaning is when two things act upon orinfluence one another. The statistical meaning is sim-ilar, but it does not imply that a physical interaction,say between two food chemicals occurred. Instead, theterm “interaction” means that the effect of one vari-able changed depending upon the level of the othervariable. Here are two examples of interaction. For thesake of simplicity, only means are given to representtwo variables at two points each.

In the first example, two panels evaluated the firm-ness of texture of two food products. One panel saw

a big difference between the two products while thesecond found only a small difference. This is visibleas a difference in the slope of the lines connecting theproduct means. Such a difference in slope is called amagnitude interaction when both slopes have the samesign, and it is fairly common in sensory research. Forexample, panelists may all evaluate a set of productsin the same rank order, but some may be more fatiguedacross replications. Decrements in scores will occur forsome panelists more than others, creating a panelist byreplicate interaction.

The second example of an interaction is a little lesscommon. In this case the relative scores for the twoproducts change position from one panel to the other.One panel sees product 1 as deserving a higher ratingthan product 2, while the other panel finds product 2 tobe superior. This sort of interaction can happen withconsumer acceptance ratings when there are marketsegments or in descriptive analysis if one panel mis-understands the scale direction or it is misprinted onthe ballot (e.g., with end-anchor words reversed). Thisis commonly called a crossover interaction. Figure C.4shows these interaction effects. A crossover interactionis much more serious and can be a big problem whenthe interaction effect is part of the error term as in someANOVAs (see Sections C.5 below and C.6.1).

C.5 Panelist by Product by ReplicateDesigns

A common design in sensory analysis is the two-way ANOVA with all panelists rating all products(complete block) and replicated ratings. This design

1

3

5

7

9

Mea

n f

irm

nes

s ra

tin

g

Product 1 Product 2

Panel 2

Panel 1

1

3

5

7

9

Mea

n a

ccep

tab

ility

rat

ing

Product 1 Product 2

Consumer group 2

Consumer group 1

Fig. C.4 Interaction effect.Upper panel: Magnitudeinteraction. Lower panel:Crossover interaction.

Page 45: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C 517

would be useful with a descriptive panel, for example,where panelists commonly evaluate all the products.An example of the two factors is when there is one setof products and replications. Each score is a function ofthe panelist effect, treatment effect, replication effect,interactions, and error.

Error terms for treatment and replication are theinteraction effects of each with panelists, which formthe denominator of the F-ratio. This is done becausepanelist effects are random effects (see Section C.6.1),and the panelist by treatment interaction is embeddedin the variance estimate for treatments. For purposesof this example, treatments and replications are con-sidered fixed effects (this is a mixed model or TYPEIII in some statistical programs like SAS). The sampledata set is shown in Table C.10. Once again there are asmall number of panelists so that the calculations willbe a little simpler. Of course, in most real sensory stud-ies the panel size would be considerably larger, e.g.,10–12 for descriptive data and 50–100 for consumerstudies.

Table C.10 Data set for a two-factor ANOVA with partitioningof panelist variation

Replicate 1 Replicate 2

Product A B C A B C

Panelist 1 6 8 9 4 5 10Panelist 2 6 7 8 5 8 8Panelist 3 7 10 12 6 7 9

The underlying model says that the total varianceis a function of the product effect, replicate effect,panelist effect, the three two-way interactions, thethree-way interaction, and random error. We have noestimate of the smallest within-cell error term otherthan the three-way interaction. Another way to thinkabout this is that each score deviates from the grandmean as a function of that particular product mean,that particular panelist mean, that particular replicationmean, plus (or minus) any other influences from theinteractions.

Here are the calculations, step by step. This is a littlemore involved than our examples so far. We will callthe effect of each factor a “main effect” as opposed tothe interaction effects and error.

Step 1. First, we calculate sums of squares and maineffects.

As in the one way ANOVA with repeated measures,there are certain values we need to accumulate:

Grand total = 135(Grand total)2/N = T2/N= 18,255/18 = 1012.5

(O’Mahony’s “correction factor”, C)Sum of squared data = 1083

There are three “marginal sums” we need to calcu-late, in order to estimate main effects.

The product marginal sums (across panelists andreps):

�A = 34 (�A)2 = 1156�B = 45 (�B)2 = 2025�C = 56 (�C)2 = 3136

The sum of squares for products then becomes

SSproducts = [(1156 + 2025 + 3136)/6] − correction factor, C= 052.83 − 1012.5 = 40.33

(We will need the value, 1052.83, later. Let us callit PSS1, for “partial sum of squares” #1)

Similarly, we calculate replicate and panelist sumsof squares.

The replicate marginal sums (across panelists andproducts):

�rep1 = 73 (�rep1)2 = 5,329�rep2 = 62 (�rep2)2 = 3,844

The sum of squares for replicates then becomes

SSreps = [(5329 + 3844)/9] − correction factor= 1019.2 − 1012.5 = 6.72

Note: the divisor, 9, is not the number of reps(2), but the number of panelists times the number ofproducts (3×3 = 9). Think of this as the number ofobservations contributing to each marginal total. (Weneed the value, 1019.2, later in calculations. Let us callit PSS2).

As in other repeated measures designs, we need thepanelist sums (across products and reps):

�pan1 = 42 (�pan1)2 = 1764�pan2 = 42 (�pan2)2 = 1764�pan3 = 51 (�pan3)2 = 2601

The sum of squares for panelists then becomes

SSpan = [(1764 + 1764 + 2601)/6] − correction factor, C= 1021.5 − 1012.5 = 9.00

(We will need the value, 1021.5, later in calcula-tions. Let us call it PSS3, for partial sum of squares #3)

Page 46: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

518 Appendix C

Step 2. Next, we need to construct summary tablesof interaction sums.

Here are the rep-by-product interaction calcula-tions. We obtain a sum for each replicate by productcombination and then square them. The three interac-tion tables are shown in Table C.11.

Table C.11 Interaction calculations

Product Rep 1 Rep 2 Squared values

A 19 15 361 225B 25 20 625 400C 29 27 841 729

Product Panelist 1 Panelist 2 Panelist 3 Squared values

A 10 11 13 100 121 169B 13 15 17 169 225 289C 19 16 21 361 256 441

Panelist Rep 1 Rep 2 Squared values

1 23 19 529 3612 21 21 441 4413 29 22 841 484

First for the product by replicate table, we obtain thefollowing information:

Sum of squared values = 3181; 3181/3 = 1060.3(= PSS4, needed later)

To calculate the sum of squares, we need to subtractthe PSS values for each main effect and then add backthe correction term (this is dictated by the underlyingvariance model):

SSrep×prod = (3181/3) − PSS1 − PSS2 + C= 1060.3 − 1052.83 − 1019.2 + 1012.5= 0.77

Next we look at the panelist by product interactioninformation. The panelist by product interaction cal-culations are based on the center of Table C.11. Onceagain, we accumulate the sums for each combinationand then square them giving these values:

Sum of squared values = 2131; 2131/2 = 1065.5(= PSS5, needed later)

SSpan×prod = (2131/2) − PSS1 − PSS3 + C= 1065.5 − 1052.83 − 1021.5 + 1012.5= 3.67

Here are the replicate by panelist interaction calcu-lations, based on the lower part of Table C.11:

Sum of squared values = 3097; 3097/3 = 1032.3(= PSS6, needed later)

SSR×pan = (3097/3) − PSS2 − PSS3 + C= 1032.3 − 1021.5 − 1019.2 + 1012.5= 4.13

The final estimate is for the sum of squares for thethree-way interaction. This is all we have left in thisdesign, since we are running out of degrees of freedom.This is found by the sum of the squared (data) values,minus each PSS from the interactions, plus each PSSfrom the main effects, minus the correction factor. Donot worry too much about where this comes from, youwould need to dissect the variance component modelto fully understand it:

SS (3 way) =∑

x2 − PSS4 − PSS5 − PSS6 + PSS1+ PSS2 + PSS3 − C = 1083 − 1060.3− 1065.5 − 1032.3 + 1052.83+ 1019.2 + 1021.5 − 1012.5 = 5.93

Step 3. Using the above values, we can calculate thefinal results as shown in the source Table C.12.

Table C.12 Source table for panelist by product by replicateANOVA

Effect Sum of squares df Mean square F

Products 40.33 2 20.17 21.97Prod × panelist 3.67 4 0.92Replicates 6.72 1 6.72 3.27Rep × panelist 4.13 2 2.06Product × rep 0.77 2 0.39 0.26Prod × rep × panelist 5.93 4 1.47

Note that the error terms for each effect are theinteraction with panelists. This is dictated by the factthat panelists are a random effect and this is a “mixedmodel” analysis.

So only the product effect was significant. The crit-ical F-ratios were 6.94 for the product effect (2,4 df),19.00 for the replicate effect (1,2 df), and 6.94 for theinteraction (2,4 df).

Degrees of freedom are calculated as follows:

For the main effects, df = levels–1, e.g., three productsgives 2 df.

For interactions, df = product of df for individualfactors, (e.g., prod × pan df = (3–1) × (3–1) = 4).

Page 47: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C 519

C.6 Issues and Concerns

C.6.1 Sensory Panelists: Fixed or RandomEffects?

In a fixed effects model, specific levels are chosenfor a treatment variable, levels that may be repli-cated in other experiments. Common examples offixed effect variables might be ingredient concentra-tions, processing temperatures, or times of evaluationin a shelf life study. In a random effects model, thevalues of a variable are chosen by being randomlyselected from the population of all possible levels.Future replications of the experiment might or mightnot select this exact same level, person, or item. Theimplication is that future similar experiments wouldalso seek another random sampling rather than tar-geting specific levels or spacing of a variable. In thisANOVA model, the particular level chosen is thoughtto exert a systematic influence on scores for other vari-ables in the experiment. In other words, interaction isassumed.

Examples of random effects in experimental designare common in the behavioral sciences. Words chosenfor a memory study or odors sampled from all availableodor materials for a recognition screening test are ran-dom, not fixed, stimulus effects. Such words or odorsrepresent random choices from among the entire set ofsuch possible words or odors and do not represent spe-cific levels of a variable that we have chosen for study.Furthermore, we wish to generalize to all such possi-ble stimuli and make conclusions about the parent setas a whole and not just the words or odors that we hap-pened to pick. An persistent issue is whether sensorypanelists are ever fixed effects. The fixed effects modelis simpler and is the one most people learn in a begin-ning statistics course, so it has unfortunately persistedin the literature even though behavioral science dictatesthat human subjects or panelists are a random effect,even as they are used in sensory work.

Although they are never truly randomly sampled,panelists meet the criteria of being a sample of alarger population of potential panelists and of not beingavailable for subsequent replications (for example, inanother lab). Each panel has variance associated withits composition, that is, it is a sample of a larger pop-ulation. Also, each product effect includes not onlythe differences among products and random error, but

also the interaction of each panelist with the productvariable. For example, panelists might have steeper orshallower slopes for responding to increasing levelsof the ingredient that forms the product variable. Thiscommon type of panelist interaction necessitates theconstruction of F-ratios with interaction terms in thedenominator. Using the wrong error term (i.e., fromsimple fixed effects ANOVAs) can lead to erroneousrejection of the null hypothesis.

Fixed effects are specific levels of a variablethat experimenters are interested in, whereas randomeffects are samples of a larger population to which theywish to generalize the other results of the experiment.Sokal and Rohlf (1981) make the following useful dis-tinction: [Fixed versus random effects models depend]“on whether the different levels of that factor can beconsidered a random sample of more such levels, orare fixed treatments whose differences the investigatorwishes to contrast.” (p. 206).

This view has not been universally applied to pan-elist classification within the sensory evaluation com-munity. Here are some common rejoinders to thisposition.

When panelists get trained, are they no longer arandom sample and therefore a fixed effect. This isirrelevant. We wish to generalize these results to anysuch panel of different people, similarly screened andtrained, from the population of qualifying individu-als. Hays (1973) puts this in perspective by statingthat even though the sample has certain characteris-tics, that does not invalidate its status as a sample of alarger group: “Granted that only subjects about whoman inference is to be made are those of a certain age,sex, ability to understand instructions, and so forth, theexperimenter would, nevertheless, like to extend hisinference to all possible such subjects.” (p. 552).

A second problem arises about the use of the inter-action term in mixed model ANOVAs. We can assumeno interaction in the model or even test for the exis-tence of a significant interaction. The answer is thatyou can, but why choose a riskier model, inflating yourchance of Type I error? If you test for no significantinteraction, you depend upon a failure to reject the null,which is an ambiguous result, since it can happen froma sloppy experiment with high error variance, just aswell as from a situation where there is truly no effect.So it is safer to use a mixed model where panelistsare considered random. Most statistical packages willselect the interaction effect as the error term when you

Page 48: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

520 Appendix C

specify panelists as a random effect, and some evenassume it as the default.

Further discussion of this issue can be found inthe book by Lea et al. (1998), Lundahl and McDaniel(1988) and in Lawless (1998) and the articles in thesame issue of that journal.

C.6.2 A Note on Blocking

In factorial designs with two or more variables, the sen-sory specialist will often have to make some decisionsabout how to group the products that will be viewedin a single session. Sessions or days of testing oftenform one of the blocks of an experimental design. Theexamples considered previously are fairly common inthat the two variables in a simple factorial design areoften products and replicates. Of course judges, are athird factor, but a special one. Let us put judges asidefor the moment and look at a complete block design inwhich each judge participates in the evaluation of allproducts and all of the replicates. This is a commondesign in descriptive analysis using trained panels.

Consider the following scenario: Two sensory tech-nicians (students in a food company on summer intern-ships) are given a sensory test to design. The test willinvolve comparisons of four different processed softcheese spreads and a trained descriptive panel is avail-able to evaluate key attributes such as cheese flavorintensity, smoothness, and mouthcoating. Due to thetendency of this product to coat the mouth, only fourproducts can be presented in each session. The panelis available for testing on four separate sessions on dif-ferent days. There are then two factors to be assignedto blocks of sessions in this experiment, the productsand the replicates.

Technician “A” decides to present one version of thecheese spread on each day, but replicate it four timeswithin a session. Technician “B” on the other handpresents all four products on each day, so that the ses-sions (days) become blocked as replicates. Both tech-nicians use counterbalanced orders of presentation,random codes, and other reasonable good practices ofsensory testing. The blocking schemes are illustratedin Fig. C.5.

Which design seems better? A virtually unanimousopinion among sensory specialists we asked is thatassigning all four products within the same sessionis better than presenting four replicates of the sameproduct within a session. The panelists will have amore stable frame of reference within a session than

across sessions, and this will improve the sensitivityof the product comparison. There may be day-to-dayvariations in uncontrolled factors that may confoundthe product comparisons across days (changes in con-ditions, changes in the products while aging, or in thepanelists themselves) and add to random error. Havingthe four products present in the same session lends acertain directness to the comparison without any bur-den of memory load. There is less likelihood of driftin scale usage within a session as opposed to testingacross days.

Why then assign products within a block and repli-cates across sessions? Simply stated, the product com-parison is most often the more critical comparison ofthe two. Product differences are likely to be the criticalquestion in a study. A general principle for assignmentof variables to blocks in sensory studies where theexperimental blocks are test sessions: Assign the vari-able of greatest interest within a block so that all levelsof that factor are evaluated together. Conversely, assignthe variable of secondary interest across the blocks ifthere are limitations in the number of products that canbe presented.

C.6.3 Split-Plot or Between-Groups(Nested) Designs

It is not always possible to have all panelists or con-sumers rate all products. A common design uses dif-ferent groups of people to evaluate different levels of avariable. In some cases, we might simply want to com-pare two panels, having presented them with the samelevels of a test variable. For example, we might have aset of products evaluated in two sites in order to see ifpanelists are in agreement in two manufacturing plantsor between a QC panel and an R and D panel. In thiscase, there will be repeated measures on one variable(the products) since all panelists see all products. Butwe also have a between-groups variable that we wish tocompare. We call this a “split plot” design in keepingthe nomenclature of Stone and Sidel (1993). It origi-nates from agricultural field experiments in which plotswere divided to accommodate different treatments.Bear in mind that we have one group variable andone repeated measures variable. In behavioral research,these are sometimes called “between-subjects” and“within-subjects” effects. Examples of these designscan be found in Stone and Sidel (1993) and Gaculaet al. (2009).

Page 49: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C 521

Four product are available on four days. Four replicates are desired. The product causes carry-over and fatigue and tends to coat the mouth, so only four products can be tested in any one day.How are the two factors to be assigned to the blocks of days (sessions)?

A Blocking Problem

Day 1 Day 2 Day 3 Day 4

Approach "A" : assign replicates within a block, products across days (sessions):

Day 1 Day 2 Day 3 Day 4

Rep 1 Rep 1 Rep 1 Rep 1

Rep 2 Rep 2 Rep 2 Rep 2

Rep 3 Rep 3 Rep 3 Rep 3

Rep 4 Rep 4 Rep 4 Rep 4

Product 1 Product 2 Product 3 Product 4

Approach "B": Test all four products in each day, once:

Which blocking scheme is better and why?

Rep 1 Rep 2 Rep 3 Rep 4

prod1 prod1 prod1 prod1

prod2 prod2 prod2 prod2

prod3 prod3 prod3 prod3

prod4 prod4 prod4 prod4

Fig. C.5 Examples ofblocking strategy for thehypothetical example of theprocessed cheese spread.

C.6.4 Statistical Assumptions and theRepeated Measures ANOVA

The model underlying the repeated measures analy-sis of variance from complete block designs has moreassumptions than the simple one-way ANOVA. One ofthese is an assumption that the covariance (or degreeof relationship) among all pairs of treatment levels

is the same. Unfortunately, this is rarely the case inobservations of human judgment.

Consider the following experiment: We have sen-sory judges examine an ice cream batch for shelf lifein a heat shock experiment on successive days. Weare lucky enough to get the entire panel to sit forevery experimental session, so we can use a completeblock or repeated measures analysis. However, theirframe of reference is changing slightly, a trend that

Page 50: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

522 Appendix C

seems to be affecting the data as time progresses. Theirdata from adjacent days are more highly correlatedthan their data from the first and last test. Such time-dependencies violate the assumption of “homogeneityof covariance.”

But all is not lost. A few statisticians have sug-gested some solutions, if the violations were not toobad (Greenhouse and Geisser, 1959; Huynh and Feldt,1976). Both of these techniques adjust your degrees offreedom in a conservative manner to try and accountfor a violation of the assumptions and still protectyou from that terrible deed of making a Type I error.The corrections are via an “epsilon” value that youwill sometimes see in ANOVA package printouts,and adjusted p-values, often abbreviated G-G or H-F. Another solution is to use a multivariate analysisof variance approach or MANOVA, which does notlabor under the covariance assumptions of repeatedmeasures. Since most packaged printouts give youMANOVA statistics nowadays anyway, it does not hurtto give them a look and see if your conclusions aboutsignificance would be any different.

C.6.5 Other Options

Analysis of variance remains the bread-and-buttereveryday statistical tool for the vast majority of sensoryexperiments and multi-product comparison. However,it is not without its shortcomings. One concern is thatwe end up examining each scale individually, evenwhen our descriptive profile many contain many scalesfor flavor, texture, and appearance. Furthermore, manyof these scales are intercorrelated. They may be provid-ing redundant information or they may be driven by thesame underlying of latent causes. A more comprehen-sive approach would include multivariate techniquessuch as principal components analysis, to assess thesepatterns of correlation. Analysis of variance can alsobe done following principal components, using thefactor scores rather than the raw data. This has theadvantage of simplifying the analysis and reportingof results although the loss of detail about individ-ual scales may not always be desired. See Chapter 18for descriptions of multivariate techniques and theirapplications.

A second concern is the restrictive assumptions ofANOVA that are often violated. Normal distributions,

equal variance, and in the case of repeated measures,homogeneity of covariance, are not always the case inhuman judgments. So our violations lead to unknownchanges in the risk levels. Risk may be underesti-mated as the statistical probabilities of our analysisare based on distributions and assumptions that arenot always descriptive of our experimental data. Forsuch reasons, it has recently become popular to useMANOVA. Many current statistical analysis softwarepackages offer both types of analysis and some evengive them automatically or as defaults. The sensoryscientist can then compare the outcomes of the twotypes of analyses. If they are the same, the conclu-sions are straightforward. If they differ, some cautionis warranted in drawing conclusions about statisticalsignificance.

The analyses shown in this section are relativelysimple ones. It is obvious that more complex experi-mental designs are likely to come the way of a sen-sory testing group. In particular, incomplete designsin which people evaluate only some of the productsin the design are common. Product developers oftenhave many variables at several different levels each toscreen. We have stressed the complete block designshere because of the efficient and powerful partition-ing of panelist variance that is possible. Discussionsof incomplete designs are their efficiency can be foundin various statistics texts.

We recognize that it is unlikely at this point in thehistory of computing that many sensory professionalsor statistical analysis services will spend much timedoing ANOVAs by hand. If the experimental designsare complex or if many dependent measures have beencollected, software packages are likely to be used. Inthese cases the authors of the program have taken overthe burden of computing and partitioning variance.However, the sensory scientist can still make decisionson a theoretical level. For example, we have includedall interaction terms in the above analyses, but the lin-ear models upon which ANOVAs are based need notinclude such terms if there are theoretical or practicalreasons to omit them. Many current statistical analysispackages allow the specification of the linear model,giving discretionary modeling power to the scientist.In some cases it may be advantageous to pool effectsor omit interactions from the model if their variancecontribution is small. This will increase the degreesof freedom for the remaining factors and increase thechances of finding a significant effect.

Page 51: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix C 523

References

Gacula, M. C., Jr. and Singh, J. 1984. Statistical Methods in Foodand Consumer Research. Academic, Orlando, FL.

Gacula, M., Singh, J., Bi, J. and Altan, S. 2009.Statistical Methods in Food and Consumer Research.Elsevier/Academic, Amsterdam.

Greenhouse, S. W. and Geisser, S. 1959. On methods in theanalysis of profile data. Psychometrika, 24, 95–112.

Hays, W. L. 1973. Statistics for the Social Sciences, SecondEdition. Holt, Rinehart and Winston, New York.

Huynh, H. and Feldt, L. S. 1976. Estimation of the Box correc-tion for degrees of freedom in the randomized block and splitplot designs. Journal of Educational Statistics, 1, 69–82.

Lawless, H. 1998. Commentary on random vs. fixed effects forpanelists. Food Quality and Preference, 9, 163–164.

Lea, P., Naes, T. and Rodbotten, M. 1998. Analysis of Variancefor Sensory Data. Wiley, Chichester, UK.

Lundahl, D. S. and McDaniel, M. R. 1988. The panelist effect—fixed or random? Journal of Sensory Studies. 3, 113–121.

O’Mahony, M. 1986. Sensory Evaluation of Food. StatisticalMethods and Procedures. Marcel Dekker, New York.

Sokal, R. R. and Rohlf, F. J. 1981. Biometry, Second Edition. W.H. Freeman, New York.

Stone, H. and Sidel, J. L. 1993. Sensory Evaluation Practices,Second Edition. Academic, San Diego.

Winer, B. J. 1971. Statistical Principles in Experimental Design,Second Edition. McGraw-Hill, New York.

Page 52: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and
Page 53: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix D

Correlation, Regression, and Measures of Association

Contents

D.1 Introduction . . . . . . . . . . . . . . . . 525D.2 Correlation . . . . . . . . . . . . . . . . . 527

D.2.1 Pearson’s Correlation CoefficientExample . . . . . . . . . . . . . . . 528

D.2.2 Coefficient of Determination . . . . . . 529D.3 Linear Regression . . . . . . . . . . . . . . 529

D.3.1 Analysis of Variance . . . . . . . . . 530D.3.2 Analysis of Variance for Linear

Regression . . . . . . . . . . . . . . 530D.3.3 Prediction of the Regression Line . . . 530D.3.4 Linear Regression Example . . . . . . 531

D.4 Multiple Linear Regression . . . . . . . . . 531D.5 Other Measures of Association . . . . . . . . 531

D.5.1 Spearman Rank Correlation . . . . . . 531D.5.2 Spearman Correlation Coefficient

Example . . . . . . . . . . . . . . . 532D.5.3 Cramér’s V Measure . . . . . . . . . 532D.5.4 Cramér Coefficient Example . . . . . . 533

References . . . . . . . . . . . . . . . . . . . . 533

The correlation coefficient is frequently abused. First,correlation is often improperly interpreted as evidenceof causation. . . . Second; correlation is often improp-erly used as a substitute for agreement.

—Diamond (1989)

This chapter is a short introduction to correlation andregression. Pearson’s correlation coefficient and thecoefficient of determination for interval data are dis-cussed followed by a section on linear regression.There is an example on how to calculate a linearregression. An extremely brief discussion of multiplelinear regression is followed by a discussion of othermeasures of association. These are Spearman’s rankcorrelation coefficient for ordinal data and Cramér’smeasure for nominal data.

D.1 Introduction

Sensory scientists are frequently confronted with thesituation where they would like to know if there is asignificant association between two sets of data. Forexample, the sensory specialist may want to know ifthe perceived brown color intensity (dependent vari-able) of a series of cocoa powder–icing sugar mixturesincreased as the amount of cocoa (independent vari-able) in the mixture increased. Another example, thesensory scientist may want to know if the perceivedsweetness of grape juice (dependent variable) is relatedto the total concentration of fructose and glucose(independent variable) in the juice, as determined byhigh-pressure liquid chromatography.

In these cases we need to determine whether there isevidence for an association between independent anddependent variables. In some cases, we may also beable to infer a cause and effect relationship betweenindependent and dependent variables. The measures ofassociation between two sets of data are called correla-tion coefficients and if the size of the calculated corre-lation coefficient leads us to reject the null hypothesis

525

Page 54: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

526 Appendix D

of no association then we know that the change inthe independent variables—our treatments, ingredi-ents, or processes—were associated with a change inthe dependent variables—the variables we measured,such as sensory-based responses or consumer accep-tance. However, from one point of view, this hypoth-esis testing approach is scientifically impoverished. Itis not a very informative method of scientific research,because the statistical hypothesis decision is a binaryor yes/no relationship. We conclude that either thereis evidence for an association between our treatmentsand our observations or there is not. Having rejectedthe null hypothesis, we still do not know the degree ofrelationship or the tightness of the association betweenour variables. Even worse, we have not yet specifiedany type of mathematical model or equation that mightcharacterize the association.

Associational measures also known as correlationcoefficients can address the first of these two questions.In other words, the correlation coefficient will allow usto decide whether there is an association between thetwo data series. Modeling, in which we attempt to fita mathematical function to the relationship, addressesthe second question. The most widespread measureof association is the simple correlation coefficient(Pearson’s correlation coefficient). The most commonapproach to model fitting is the simple linear regres-sion. These two approaches have similar underlyingcalculations and are related to one another. The firstsections of this chapter will show how correlationsand regressions are computed and give some simpleexamples. The later sections will deal with related top-ics: how to build some more complex models betweenseveral variables, how measures of association arederived from other statistical methods, such as anal-ysis of variance, and finally how to compute measuresof association when the data do not have interval scaleproperties (Spearman rank correlation coefficient andCramér’s φ′2).

Correlation is of great importance in sensory sci-ence because it functions as a building block forother statistical procedures. Thus, methods like prin-cipal components analysis draw part of their calcula-tions from measures of correlation among variables.In modeling relationships among data sets, regres-sion and multiple regression are standard tools. Acommon application is the predictive modeling of con-sumer acceptability of a product based upon othervariables. Those variables may be descriptive attributes

as characterized by a trained (non-consumer) panelor they may be ingredient or processing variables oreven instrumental measures. The procedure of multi-ple regression allows one to build a predictive modelfor consumer likes based on a number of such othervariables. A common application of regression andcorrelation is in sensory–instrumental relationships.Finally, measures of correlation are valuable tools inspecifying how reliable our measurements are—bycomparing panelist scores over multiple observationsand to assess agreement among panelists and panels.

When the sensory scientist is investigating possi-ble relationships between data series the first step isto plot the data in a scatter diagram with the X-serieson the horizontal axis and the Y-series on the verti-cal axis. Blind application of correlation and regressionanalyses may lead to wrong conclusions about the rela-tionship between the two variables. As we will seethe most common correlation and regression meth-ods estimate the parameters of the “best” straight linethrough the data (regression line) and the closenessof the points to the line (simple correlation coeffi-cient). However, the relationship may not necessarilybe described well by a straight line, in other words therelationship may not necessarily be linear. Plotting thedata in scatter diagrams will alert the specialist to prob-lems in fitting linear models to data that are not linearlyrelated (Anscombe, 1973). For example, the four datasets listed in Table D.1 and plotted in Fig. D.1 clearlyare not all accurately described by a linear model. In allfour cases the 11 observation pairs have mean x-valuesequal to 9.0, mean y-values equal to 7.5, a correlation

Table D.1 The Anscombe quartet—four data sets illustratingprinciples associated with linear correlation1

a b c d

x y x y x y x y

4 4.26 4 3.10 4 5.39 8 6.585 5.68 5 4.74 5 5.73 8 5.766 7.24 6 6.13 6 6.08 8 7.717 4.82 7 7.26 7 6.42 8 8.848 6.95 8 8.14 8 6.77 8 8.479 8.81 9 8.77 9 7.11 8 7.04

10 8.04 10 9.14 10 7.46 8 5.2511 8.33 11 9.26 11 7.81 8 5.5612 10.84 12 9.13 12 8.15 8 7.9113 7.58 13 8.74 13 12.74 8 6.8914 9.96 14 8.10 14 8.84 19 12.501Anscombe (1973)

Page 55: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix D 527

Fig. D.1 Scatter plots of theAnscombe quartet data(Table D.1) (redrawn fromAnscombe).

coefficient equal to 0.82, and a regression line equationof y = 3 + 0.5x.

However as Fig. D.1 clearly indicates the data setsare very different. These data sets are the so-calledAnscombe quartet created by their author to highlightthe perils of blindly calculating linear regression mod-els and Pearson correlation coefficients without firstdetermining through scatter plots whether a simplelinear model is appropriate (Anscombe, 1973).

D.2 Correlation

When two variables are related, a change in one is usu-ally accompanied by a change in the other. However,the changes in the second variable may not be linearlyrelated to changes in the first. In these cases, the corre-lation coefficient is not a good measure of associationbetween the variables (Fig. D.2).

The concomitant change between the variables mayoccur because the two variables are causally related. Inother words, the change in the one variable causes thechange in the other variable. In the cocoa–icing sugarexample the increased concentration of cocoa powderin the mixture caused the increase in perceived brown

color. However, the two variables may not be neces-sarily causally related because a third factor may drivethe changes in both variables or there may be severalintervening variables in the causal chain between thetwo variables (Freund and Simon, 1992). An anecdotalexample often used in statistics texts is the following:Some years after World War II statisticians found a cor-relation between the number of storks and the numberof babies born in England. This did not mean that thestorks “caused” the babies. Both variables were relatedto the re-building of England (increase in families andin the roofs which storks use as nesting places) afterthe war. One immediate cautionary note in associa-tional statistics is that the causal inference is not oftenclear. The usual warning to students of statistics is that“correlation does not imply causation.”

The rate of change or dependence of one variable onanother can be measured by the slope of a line relatingthe two variables. However, that slope will numericallydepend upon the units of measurement. For exam-ple, the relationship between perceived sweetness andmolarity of a sugar will have a different slope if con-centration is measured in percent-by-weight or if unitsshould change to milli-molar instead of molar concen-tration. In order to have a measure of association that isfree from the particular units that are chosen, we must

Page 56: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

528 Appendix D

Fig. D.2 The correlationcoefficient could be used as asummary measure for plots Aand B, but not for plots C andD (in analogy to Meilgaardet al., 1991).

standardize both variables so that they have a com-mon system of measurement. The statistical approachto this problem is to replace each score with its stan-dard score or difference from the mean in standarddeviation units. Once this is done, we can measure theassociation by a measure known as the Pearson corre-lation coefficient (Blalock, 1979). When we have notstandardized the variables, we still can measure theassociation, but now it is simply called covariance (thetwo measures “vary together”) rather than correlation.The measure of correlation is the extremely useful andvery widespread in its applications.

The simple or Pearson correlation coefficient iscalculated using following computational equation(Snedecor and Cochran, 1980):

r =∑

xy −∑

x∑

yn√[∑

x2 − (∑

x)2

n

] [∑y2 − (

∑y)2

n

] (D.1)

where the series of x data points = the independentvariable and the series of y data points = the dependentvariable.

Each data set has n data points and the degrees offreedom associated with the simple correlation coeffi-cient are n–2. If the calculated r-value is larger than

the r-value listed in the correlation table (Table F2) forthe appropriate alpha, then the correlation between thevariables is significant.

The value of Pearson’s correlation coefficientalways lies between −1 and +1. Values of r close toabsolute 1 indicate that a very strong linear relation-ship exists between the two variables. When r is equalto zero, then there is no linear relationship betweenthe two variables. Positive values of r indicate a ten-dency for the variables to increase together. Negativevalues of r indicate a tendency of large values of onevariable to be associated with small values of the othervariable.

D.2.1 Pearson’s Correlation CoefficientExample

In this study a series of 14 cocoa–icing sugar mixtureswere rated by 20 panelists for brown color intensityon a nine-point category scale. Was there a significantcorrelation between the percentage of cocoa powderadded to the icing sugar mixture and the perceivedbrown color intensity?

Page 57: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix D 529

Cocoa added, % Mean Brown color intensity X2 Y2 XY

30 1.73 900 2.98a 51.8235 2.09 1225 4.37 73.1840 3.18 1600 10.12 127.2745 3.23 2025 10.42 145.2350 4.36 2500 19.04 218.1855 4.09 3025 16.74 225.0060 4.68 3600 21.92 280.9165 5.77 4225 33.32 375.2370 6.91 4900 47.74 483.6475 6.73 5625 45.26 504.5480 7.05 6400 49.64 563.6485 7.77 7225 60.42 660.6890 7.18 8100 51.58 646.3695 8.54 9025 73.02 811.82�X = 875 �Y = 73.32 �X2 = 60375 �Y2 = 446.56 �XY = 5167.50a

aValues rounded to decimals after calculation of squares and products

r = 5167.50 − 875×72.3214√[

60.375 − (875)2

14

] [446.56 − (73.32)2

14

] = 0.9806

At an alpha-level of 5% the tabular value of the cor-relation coefficient for 12 degrees of freedom is 0.4575(Table F2). The calculated value = 0.9806 exceeds thetable value and we can conclude that there is a sig-nificant association between the perceived brown colorintensity of the cocoa–icing sugar mixture and the per-centage of cocoa added to the mixture. Since the onlyingredient changing in the mixture is the amount ofbrown color we can also conclude that the increasedamount of cocoa causes the increased perception ofbrown color.

D.2.2 Coefficient of Determination

The coefficient of determination is the square ofPearson’s correlation coefficient (r2) and it is the esti-mated proportion of the variance of the data set Y thatcan be attributed to its linear correlation with X, while1–r2 (the coefficient of non-determination) is the pro-portion of the variance of Y that is free from effect ofX (Freund and Simon, 1992). The coefficient of deter-mination can range between 0 and 1 and the closer ther2 value is to 1 the better the straight line fits.

D.3 Linear Regression

Regression is a general term for fitting a function, usu-ally a linear one, to describe the relationship amongvariables. Various methods are available for fittinglines to data, some based on mathematical solutionsand others based on iterative or step-by-step trials tominimize some residual error or badness-of-fit mea-sure. In all of these methods there must be somemeasurement of how good the fit of the model or equa-tion is to the data. The least-squares criterion is themost common measure of fit for linear relationships(Snedecor and Cochran, 1980, Afifi and Clark, 1984).In this approach, the best fitting straight line is foundby minimizing the squared deviations of every datapoint from the line in the y-direction (Fig. D.3).

The simple linear regression equation is

y = a + bx (D.2)

where a is the value of the estimated intercept; b thevalue of the estimated slope.

The estimated least-squares regression is calculatedusing the following equations:

b = �xy − (�x)(�y)n

�x2 − (�x)2

n

(D.3)

a =∑

y/n−b =∑

x/n (D.4)

Page 58: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

530 Appendix D

Fig. D.3 Least-squares regression line. The least-squares crite-rion minimizes the squared residuals for each point. The arrowindicates the residual for an example point. The dashed line indi-cates the mean X and Y values. The equation for the line isY = bx + a.

It is possible to assess the goodness of fit of theequation in different ways. The usual methods usedare the coefficient of determination and the analysis ofvariance (Piggott, 1986).

D.3.1 Analysis of Variance

In a linear regression it is possible to partition the totalvariation into the variation explained by the regres-sion analysis and the residual or unexplained variation(Neter and Wasserman, 1974). The F-test is calculatedby the ratio of the regression mean square to the resid-ual mean square with 1 and (n–2) degrees of freedom.This tests whether the fitted regression line has a non-zero slope. The equations used to calculate the totalsums of squares and the sums of squares associatedwith the regression are as follows:

SStotal =∑

(Yi − Y)2 (D.5)

SSregression =∑

(Y − Y) (D.6)

SSresidual =∑

(Yi − Y)2 = SStotal − SSregression

(D.7)

where Yi is the value of a specific observation, Y is themean for all the observations, and Y is the predictedvalue for the specific observation.

D.3.2 Analysis of Variance for LinearRegression

Source ofvariation

Degreesoffreedom

Sum ofsquares Mean squares F-value

Regression 1 SSregression SSregression/1 MSregression/MSresidual

Residual n–2 SSresidual SSresidual/(n–2)

Total n–1 SStotal

D.3.3 Prediction of the Regression Line

It is possible to calculate confidence intervals forthe slope of the fitted regression line (Neter andWasserman, 1974). These confidence intervals lie onsmooth curves (the branches of a hyperbola) on eitherside of the regression line (Fig. D.4). The confidenceintervals are at their smallest at the mean value of

Fig. D.4 The confidence region for a regression line is twocurved lines on either side of the linear regression line. Thesecurved lines are closest to the regression line at the mean valuesfor X and Y.

Page 59: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix D 531

the X-series and the intervals become progressivelylarger as one moves away from the mean value ofthe X-series. All predictions using the regression lineshould fall within the range used to calculate the fittedregression line. No predictions should be made outsidethis range. The equations to calculate the confidenceintervals are as follows:

Y0 ± t

{1 + 1

n+ (x0 − x)2∑

(xi − x)2

} 12

S (D.8)

where x0 is the point you are estimating around; xi

are the measured points; x is the mean value for thex-series; n is the number of observations; and S is theresidual mean square of regression (see below). Thet-value is determined by the degrees of freedom (n–1)since this is a two-tailed test at an α/2 level (Rawlingset al., 1998).

D.3.4 Linear Regression Example

We will return to the cocoa powder–icing sugarmixture example used as an example in Pearson’scorrelation coefficient section. A linear regression linewas fitted to the data using the equations given above:

b = 5167.50− 875×72.3214

60,375− (875)214

= 0.102857

a = 5.237 − (0.102857)(62.5) = −1.19156

The fitted linear regression equation is y = 0.1028X–1.1916 and the coefficient of determination is0.9616 with 12 degrees of freedom. However, weshould round this equation to y = 0.10 X–1.19 sinceour level of certainty does not justify four decimalplaces!

D.4 Multiple Linear Regression

Multiple linear regression (MLR) calculates the lin-ear combination of independent variables (more thanone X) that is maximally correlated with the depen-dent variable (a single Y). The regression is performedin a least-squares fashion by minimizing the sum ofsquares of the residual (Afifi and Clark, 1984; Stevens,1986). The regression equation must be cross-validatedby applying the equation to an independent samplefrom the same population—if predictive power drops

sharply then the equation has only limited utility. Ingeneral, one needs about 15 subjects (or observations)per independent variable to have some hope of suc-cessful cross-validation, although some scientists willdo multiple linear regression with as few as five timesas many observations as independent variables. Whenthe independent variables are intercorrelated or mul-ticollinear it spells real trouble for the potential useof MLR. Multicollinearity limits the possible mag-nitude of the multiple correlation coefficient (R) andmakes determining the importance of a given indepen-dent variable in the regression equation difficult as theeffects of the independent variables are confoundeddue to high inter-correlations. Also, when variablesare multicollinear then the order that independent vari-ables enter the regression equation makes a differencewith respect to the amount of variance on Y that eachvariable accounts for. It is only with totally uncorre-lated independent variables that the order has no effect.As seen in Chapter 18, principal component analysis(PCA) creates orthogonal (non-correlated) principalcomponents (PCs). It is possible to MLR using thesenew variables when one started with highly multi-collinear data (as is often found in sensory descriptivedata). The biggest problem with doing MLR on PCsis that it may be very difficult to interpret the resul-tant output. Multiple linear regression analyses canbe performed using any reputable statistical analysissoftware package (Piggott, 1986).

D.5 Other Measures of Association

D.5.1 Spearman Rank Correlation

When the data are not derived from an interval scalebut from an ordinal scale the simple correlation coef-ficient is not appropriate. However, the Spearman rankcorrelation coefficient is appropriate. This correlationcoefficient is a measure of the association between theranks of independent and dependent variables. Thismeasure is also useful when the data are not normallydistributed (Blalock, 1979).

The Spearman coefficient is often indicated bythe symbol ρ (rho); however, sometimes rs is alsoused. Similar to the simple correlation coefficient theSpearman coefficient ranges between –1 and 1. Valuesof rs close to absolute 1 indicate that a very strong

Page 60: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

532 Appendix D

relationship exists between the ranks of the two vari-ables. When ρ is equal to zero, then there is norelationship between the ranks of the two variables.Positive values of ρ indicate a tendency for the ranksof the variables to increase together. Negative values ofρ indicate a tendency of large rank values of one vari-able to be associated with small rank values of the othervariable. The Spearman correlation coefficient for datawith a few ties is calculated using the following equa-tion; the equation to use for data with many ties seeAppendix B:

ρ = 1 − 6�d2

n(n2 − 1)(D.9)

where n is the number of ranked products and d thedifferences in ranks for each product between the twodata series.

Critical values of ρ are found in Spearman rankcorrelation tables (Table F1). When the value for n ismore than 60 then the Pearson tabular values can beused to determine the tabular values for the Spearmancorrelation coefficient.

D.5.2 Spearman Correlation CoefficientExample

We are returning to the cocoa powder example used forPearson’s correlation coefficient. In this case the twopanelists were asked to rank the perceived brown inten-sities of the 14 cocoa powder–icing sugar mixtures.Was there a significant correlation between the ranksassigned by the two panelists to the cocoa mixtures?

rs = 1 − 6 × 42

14(156 − 1)= 0.8839

The tabular value for the Spearman correlation coef-ficient with 14 ranks at an alpha-value of 5% is equalto 0.464 (see Table F1). We can conclude that the two-panelist rank ordering of the brown color intensitiesof the cocoa powder–icing sugar mixtures were sig-nificantly similar. However, there is no direct causalrelationship between the rank order of panelist A andthat of panelist B.

D.5.3 Cramér’s V Measure

When the data are not derived from an interval oran ordinal scale but from a nominal scale then theappropriate measure of association is the Cramér mea-sure, φ′2 (phi–hat prime squared) (Herzberg, 1989).This association coefficient is a squared measure andcan range from 0 to 1. The closer the value of Cramér’sV measure is to 1 the greater the association betweenthe two nominal variables, the closer Cramér’s V mea-sure is to zero the smaller the association betweenthe two nominal variables. In practice, you may findthat a Cramer’s V of 0.10 provides a good minimumthreshold for suggesting there is a substantive relation-ship between two variables. The Cramér coefficient ofassociation is calculated using the following equation:

V =√

x2

n(q − 1)(D.10)

Cocoa in mixture, % Panelist A(X) Panelist B(Y) d d2

30 1 1 0 035 2 2 0 040 4 3 1 145 6 4 2 450 7 7 0 055 5 8 −3 960 9 6 3 965 3 5 −2 470 10 9 1 175 8 10 −2 480 11 12 1 185 14 13 1 190 13 11 2 495 12 14 −2 4

�d2 = 42

Page 61: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix D 533

where n is equal to the sample size; q is the smaller ofthe category variables represented by the rows (r) andthe columns (c); and χ2 is chi-square observed for thedata (see Table C).

D.5.4 Cramér Coefficient Example

The following data set is hypothetical. One hundredand ninety consumers of chewing gum indicated whichflavor of chewing gum they usually used. The sensoryscientist was interested in determining whether therewas an association between gender and gum flavors.

Observed values

Fruit flavor gum Mint flavor gum Bubble gum

Men 35 15 50 100Women 12 60 18 90

47 75 68 190

Expected value for women using fruit flavored gumis 47 × (90/190) = 22.263 and for men using bubblegum the expected value is 68 × (100/190) = 35.789.

Expected values

Fruit flavor Mint flavor Bubblegum gum gum

Men 24.737 39.474 35.789 100.000Women 22.263 35.526 32.210 89.999

47.000 75.000 67.999 189.998

χ2 =∑ (Oij − Eij)2

Eij

χ2 = (35 − 24.737)2

24.737+ (15 − 39.474)2

39.474

+ (50 − 35.789)2

35.789+ (12 − 22.263)2

22.263

+ (60 − 35.526)

35.526+ (18 − 32.210)2

32.210= 52.935

Thus the calculated χ2 = 52.935 and the degrees offreedom are equal to (r–1)×(c–1).

In this case with two rows and three columns df =(2–1)×(3–1) = 2. The tabular value for χ2

0.05, df =2

is 5.991 (see Table C). The χ2 value is significant.However, to determine whether there is an associationbetween the genders and their use of gum flavors weuse the following equation:

V =√√√√ χ2

n(q − 1)=√

52.935

190(2 − 1)= 0.572830714

The Cramér value of association is 0.5278. There issome association between gender and the use of gumflavors.

References

Afifi, A. A. and Clark, V. 1984. Computer-aided multivariateanalysis. Lifetime Learning, Belmont, CA, pp. 80–119.

Anscombe, F. J. 1973. Graphs in statistical analysis. AmericanStatistician, 27, 17–21.

Blalock, H. M. 1979. Social Statistics, Second Edition.McGraw-Hill Book, New York.

Diamond, G. A. 1989. Correlation, causation and agreement.The American Journal of Cardiology, 63, 392.

Freund, J. E. and Simon, G. A. 1992. Modern ElementaryStatistics, Eighth Edition. Prentice Hall, Englewood Cliffs,NJ, p. 474.

Herzberg, P. A. 1989. Principles of Statistics. Robert E. Krieger,Malabar, FL, pp. 378–380.

Neter, J. and Wasserman, W. 1974. Applied Linear Models.Richard D. Irwin, Homewood, IL, pp. 53–96.

Piggott, J. R. 1986. Statistical Procedures in Food Research.Elsevier Applied Science, New York, NY, pp. 61–100.

Rawlings, J. O., Pantula, S. G. and Dickey, D. A. 1998. AppliedRegression Analysis: A Research Tool. Springer, New York.

Snedecor, G. W. and Cochran, W. G. 1980. StatisticalMethods, Seventh Edition. Iowa State University, Ames, IA,pp. 175–193.

Stevens, J. 1986. Applied Multivariate Statistics for the SocialSciences. Lawrence Erlbaum Associates, Hillsdale, NJ.

Page 62: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and
Page 63: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix E

Statistical Power and Test Sensitivity

Contents

E.1 Introduction . . . . . . . . . . . . . . . . 535E.2 Factors Affecting the Power of Statistical

Tests . . . . . . . . . . . . . . . . . . . . 537E.2.1 Sample Size and Alpha Level . . . . . 537E.2.2 Effect Size . . . . . . . . . . . . . . 538E.2.3 How Alpha, Beta, Effect Size, and N

Interact . . . . . . . . . . . . . . . 539E.3 Worked Examples . . . . . . . . . . . . . . 541

E.3.1 The t-Test . . . . . . . . . . . . . . 541E.3.2 An Equivalence Issue with Scaled

Data . . . . . . . . . . . . . . . . . 542E.3.3 Sample Size for a Difference Test . . . . 544

E.4 Power in Simple Difference and PreferenceTests . . . . . . . . . . . . . . . . . . . . 545

E.5 Summary and Conclusions . . . . . . . . . . 548References . . . . . . . . . . . . . . . . . . . . 549

Research reports in the literature are frequently flawedby conclusions that state or imply that the null hypoth-esis is true. For example, following the finding that thedifference between two sample means is not statisti-cally significant, instead of properly concluding fromthis failure to reject the null hypothesis that the data donot warrant the conclusion that the population meansdiffer, the writer concludes, at least implicitly that thereis no difference. The latter conclusion is always strictlyinvalid, and it is functionally invalid unless power ishigh.

—J. Cohen (1988)

The power of a statistical test is the probability thatif a true difference or effect exists, the difference oreffect will be detected. The power of a test becomesimportant, especially in sensory evaluation, when a no-difference decision has important implications, such asthe sensory equivalence of two formulas or products.Concluding that two products are sensorially similar orequivalent is meaningless unless the test has sufficientpower. Factors that affect test power include the sam-ple size, alpha level, variability, and the chosen size ofa difference that must be detected. These factors arediscussed and worked examples given.

E.1 Introduction

Sensory evaluation requires experimental designs andstatistical procedures that are sensitive enough to finddifferences. We need to know when treatments ofinterest are having an effect. In food product devel-opment, these treatments usually involve changes infood constituents, the methods of processing, or typesof packaging. A purchasing department may changesuppliers of an ingredient. Product development maytest for the stability of a product during its shelf life.In each of these cases, it is desirable to know when aproduct has become perceivably different from somecomparison or control product, and sensory tests areconducted.

In normal science, most statistical tests are done toinsure that a true null hypothesis is not rejected with-out cause. When enough evidence is gathered to showthat our data would be rare occurrences given the nullassumption, we conclude that a difference did occur.This process keeps us from making the Type I error

535

Page 64: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

536 Appendix E

discussed in Appendix A. In practical terms, this keepsa research program focused on real effects and insuresthat business decisions about changes are made withsome confidence.

However, another kind of error in statistical decisionmaking is also important. This is the error associatedwith accepting the null when a difference did occur.Missing a true difference can be as dangerous as find-ing a spurious one, especially in product research. Inorder to provide tests of good sensitivity, then, the sen-sory evaluation specialist conducts tests using gooddesign principles and sufficient numbers of judges andreplicates. The principles of good practice are dis-cussed in Chapter 3. Most of these practices are aimedat reducing unwanted error variance. Panel screening,orientation, and training are some of the tools at thedisposal of the sensory specialist that can help mini-mize unwanted variability. Another example is in theuse of reference standards, both for sensory terms andfor intensity levels in descriptive judgments.

Considering the general form of the t-test, we dis-cover that two of the three variables in the statisticalformula are under some control of the sensory scientist.Remember that the t-test takes this form:

t = difference between means/standard error

and the standard error is the sample standard deviationdivided by the square root of the sample size (N). Thedenominator items can be controlled or at least influ-enced by the sensory specialist. The standard deviationor error variance can be minimized by good exper-imental controls, panel training, and so on. Anothertool for reducing error is partitioning, for example inthe removal of panelist effects in the complete blockANOVA (“repeated measures”) designs or in the pairedt-test. As the denominator of a test statistic (like a F-ratio or a t-value) becomes smaller, the value of thetest statistic becomes larger and it is easier to reject thenull. The probability of observing the results (underthe assumption of a true null) shrinks. The second fac-tor under the control of the sensory professional is thesample size. The sample size usually refers to the num-ber of judges or observations. In some ANOVA modelsadditional degrees of freedom can also be gained byreplication.

It is sometimes necessary to base business decisionson acceptance of the null hypothesis. Sometimes weconclude that two products are sensorially similar, or

that they are a good enough match that no system-atic difference is likely to be observed by regular usersof the product. In this scenario, it is critically impor-tant that a sensitive and powerful test be conductedso that a true difference is not missed, otherwise theconclusion of “no difference” could be spurious. Suchdecisions are common in statistical quality control,ingredient substitution, cost reductions, other refor-mulations, supplier changes, shelf life and packagingstudies, and a range of associated research questions.The goal of such tests is to match an existing productor provide a new process or cost reduction that doesnot change or harm the sensory quality of the item. Insome cases, the goal may be to match a competitor’ssuccessful product. An equivalence conclusion mayalso be important in advertising claims, as discussedin Chapter 5.

In these practical scenarios, it is necessary to esti-mate the power of the test, which is the probabilitythat a true difference would be detected. In statisticalterms, this is usually described in an inverse way, firstby defining the quantity beta as the long-term probabil-ity of missing a true difference or the probability thata Type II error is committed. Then one minus beta isdefined as the power of the test. Power depends uponseveral interacting factors, namely the amount of errorvariation, the sample size, and the size of the differ-ence one wants to be sure to detect in the test. Thislast item must be defined and set using the professionaljudgment of the sensory specialist or by management.In much applied research with existing food products,there is a knowledge base to help decide how much achange is important or meaningful.

This chapter will discuss the factors contributing totest power and give some worked examples and prac-tical scenarios where power is important in sensorytesting. Discussions of statistical power and workedexamples can also be found in Amerine et al. (1965),Gacula and Singh (1984), and Gacula (1991, 1993).Gacula’s writings include considerations of test powerin substantiating claims for sensory equivalence ofproducts. Examples specific to discrimination testscan be found in Schlich (1993) and Ennis (1993).General references on statistical power include theclassic text by Cohen (1988), his overview article writ-ten for behavioral scientists (Cohen, 1992) and theintroductory statistics text by Welkowitz et al. (1982).Equivalency testing is also discussed at length byWellek (2003), Bi (2006), and ASTM (2008). Let the

Page 65: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix E 537

reader note that many scientific bodies have rejectedthe idea of using test power as justification for accept-ing the null, and prefer an approach that proves that anydifference lies within a specified or acceptable interval.This idea is most applicable to proving the equivalenceof measured variables (like the bioequivalence of drugdelivery into the bloodstream). However, this equiva-lence interval approach has also been taken for simplesensory discrimination testing (see Ennis, 2008; Ennisand Ennis, 2009).

E.2 Factors Affecting the Powerof Statistical Tests

E.2.1 Sample Size and Alpha Level

Mathematically, the power of a statistical test is a func-tion of four interacting variables. Each of these entailschoices on the part of the experimenter. They mayseem arbitrary, but in the words of Cohen, “all conven-tions are arbitrary. One can only demand of them thatthey not be unreasonable” (1988, p. 12). Two choicesare made in the routine process of experimental design,namely the sample size and the alpha level. The sam-ple size is usually the number of judges in the sensorytest. This is commonly represented by the letter “N”in statistical equations. In more complex designs likemulti-factor ANOVA, “N” can reflect both the num-ber of judges and replications, or the total number ofdegrees of freedom contributing to the error terms fortreatments that are being compared. Often this value isstrongly influenced by company traditions or lab “folk-lore” about panel size. It may also be influenced by costconsiderations or the time needed to recruit, screen,and/or train and test a sufficiently large number of par-ticipants. However, this variable is the one most oftenconsidered in determinations of test power, as it caneasily be modified in the experimental planning phase.

Many experimenters will choose the number ofpanelists using considerations of desired test power.Gacula (1993) gives the following example. For a mod-erate to large consumer test, we might want to knowwhether the products differ one half a point on the9-point scale at most in their mean values. Suppose wehad prior knowledge that for this product, the standarddeviation is about 1 scale point (S = 1), we can find the

required number of people for an experiment with 5%alpha and 10% beta (or 90% power). This is given bythe following relationship:

N = (Zα + Zβ)2S2

(M1 − M2)2

(1.96 + 1.65)212

(0.5)2∼= 52 (E.1)

where M1–M2 is the minimal difference we must besure to detect and Zα and Zβ are the Z-scores associatedwith the desired Type I and Type II error limits. In otherwords, there are 52 observers required to insure that aone-half point difference in means can be ruled out at90% power when a non-significant result is obtained.Note that for any fractional N, you must round up tothe next whole person.

The second variable affecting power is the alphalevel, or the choice of an upper limit on the probabil-ity of rejecting a true null hypothesis (making a Type Ierror). Usually we set this value at the traditional levelof 0.05, but there are no hard and fast rules about thismagical number. In many cases in exploratory testingor industrial practice, the concern over Type II error—missing a true difference—are of sufficient concernthat the alpha level for reporting statistical signifi-cance will float up to 0.10 or even higher. This strategyshows us intuitively that there is a direct relationshipbetween the size of the alpha level and power, or inother words, an inverse relationship between alpha-risk and beta-risk. Consider the following outcome: weallow alpha to float up to 0.10 or 0.20 (or even higher)and still fail to find a significant p-value for our sta-tistical test. Now we have an inflated risk of finding aspurious difference, but an enhanced ability to rejectthe null. If we still fail to reject the null, even at suchrelaxed levels, then there probably is no true differenceamong our products. This assumes no sloppy experi-ment, good laboratory practices, and sufficient samplesize, i.e., meeting all the usual concerns about reason-able methodology. The inverse relationship betweenalpha and beta will be illustrated in a simple examplebelow.

Because of the fact that power increases as alpha isallowed to rise, some researchers would be tempted toraise alpha as a general way of guarding against TypeII error. However, there is a risk involved in this, andthat is the chance of finding false positives or spurious

Page 66: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

538 Appendix E

random differences. In any program of repeated test-ing, the strategy of letting alpha float up as a cheapway to increase test power should not be used. Wehave seen cases in which suppliers of food ingredientswere asked to investigate quality control failures oftheir ingredient submissions, only to find that the clientcompany had been doing discrimination tests with alax alpha level. This resulted in spurious rejectionsof many batches that were probably within acceptablelimits.

E.2.2 Effect Size

The third factor in the determination of power concernsthe effect size one is testing against as an alternativehypothesis. This is usually a stumbling block for sci-entists who do not realize that they have already madetwo important decisions in setting up the test—thesample size and alpha level. However, this third deci-sion seems much more subjective to most people. Onecan think of this as the distance between the mean of acontrol product and the mean of a test product under analternative hypothesis, in standard deviation units. Forexample, let us assume that our control product has amean of 6.0 on some scale and the sample has a stan-dard deviation of 2.0 scale units. We could test whetherthe comparison product had a value of less than 4.0or greater than 8.0, or one standard deviation from themean in a two-tailed test. In plain language, this is thesize of a difference that one wants to be sure to detectin the experiment.

If the means of the treatments were two standarddeviations apart, most scientists would call this a rela-tively strong effect, one that a good experiment wouldnot want to miss after the statistical test is conducted.If the means were one standard deviation apart, thisis an effect size that is common in many experiments.If the means were less than one half of one standarddeviation apart, that would a smaller effect, but onethat still might have important business implications.Various authors have reviewed the effect sizes seenin behavioral research and have come up with someguidelines for small, medium, and large effect sizesbased on what is seen during the course of experimen-tation with humans (Cohen, 1988; Welkowitz et al.,1982).

Several problems arise. First, this idea of effect sizeseems arbitrary and an experimenter may not have anyknowledge to aid in this decision. The sensory profes-sional may simply not know how much of a consumerimpact a given difference in the data is likely to pro-duce. It is much easier to “let the statistics make thedecision” by setting an alpha level according to tra-dition and concluding that no significant differencemeans that two products are sensorially equal. Asshown above, this is bad logic and poor experimen-tal testing. Experienced sensory scientists may haveinformation at their disposal that makes this decisionless arbitrary. They may know the levels of variabil-ity or the levels important to consumer rejection orcomplaints. Trained panels will show standard devi-ations around 10% of scale range (Lawless, 1988).The value will be slightly higher for difficult sen-sory attributes like aroma or odor intensity, and lowerfor “easier” attributes like visual and some texturalattributes. Consumers, on the other hand will haveintensity attributes with variation in the realm of 25%of scale range and sometimes even higher values forhedonics (acceptability). Another problem with effectsize is that clients or managers are often unaware of itand do not understand why some apparently arbitrarydecision has to enter into scientific experimentation.

The “sensitivity” of a test to differences involvesboth power and the overall quality of the test.Sensitivity entails low error, high power, sufficientsample size, good testing conditions, good design, andso on The term “power” refers to the formal statisticalconcept describing the probability of accepting a truealternative hypothesis (e.g. finding a true difference).In a parallel fashion, Cohen (1988) drew an importantdistinction between effect size and “operative effectsize” and showed how a good design can increasethe effective sensitivity of an experiment. He used theexample of a paired t-test as opposed to an independentgroups t-test. In the paired design subjects function astheir own controls since they evaluate both products.The between-person variation is “partitioned” out ofthe picture by the computation of difference scores.This effectively takes judge variation out of the picture.

In mathematical terms, this effect size can be statedfor the t-test as the number of standard deviations sep-arating means, usually signified by the letter “d”. Inthe case of choice data, the common estimate is ourold friend d′ (d-prime) from signal detection theory,sometimes signified as a population estimate by the

Page 67: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix E 539

Greek letter delta (Ennis, 1993). For analyses basedon correlation, the simple Pearson’s r is a commonand direct measure of association. Various measures ofeffect size (such as variance accounted for by a fac-tor) in ANOVAs have been used. Further discussion ofeffect sizes and how to measure them can be found inCohen (1988) and Welkowitz et al. (1982).

E.2.3 How Alpha, Beta, Effect Size, and NInteract

Diagrams below illustrate how effect size, alpha, andbeta interact. As an example, we perform a test witha rating scale, e.g., a just-about-right scale, and wewant to test whether the mean rating for the product ishigher than the midpoint of the scale. This is the sim-ple t-test against a fixed value, and our hypothesis isone tailed. For the simple one-tailed t-test, alpha repre-sents the area under the t-distribution to the right of thecutoff determined by the limiting p-value (usually 5%).It also represents the upper tail of the sampling distri-bution of the mean as shown in Fig. E.1. The value ofbeta is shown by the area underneath the alternative

hypothesis curve to the left of the cutoff as shaded inFig. E.1. We have shown the sampling distribution forthe mean value under the null as the bell-shaped curveon the left. The dashed line indicates the cutoff valuefor the upper 5% of the tail of this distribution. Thiswould be the common value set for statistical signifi-cance, so that for a give sample size (N), the t-value atthe cutoff would keep us from making a Type I errormore than 5% of the time (when the null is true). Theright-hand curve represents the sampling distributionfor the mean under a chosen alternative hypothesis. Weknow the mean from our choice of effect size (or howmuch of a difference we have decided is important) andwe can base the variance on our estimate from the sam-ple standard error. When we choose the value for meanscore for our test product, the d-value becomes deter-mined by the difference of this mean from the control,divided by the standard deviation. Useful examplesare drawn in Gacula’s (1991, 1993) discussion and inthe section on hypothesis testing in Sokal and Rohlf(1981).

In this diagram, we can see how the three interactingvariables work to determine the size of the shaded areafor beta-risk. As the cutoff is changed by changing thealpha level, the shaded area would become larger or

Region of rejection (of null)Region of acceptance (of null)

cutoff determined by chosen alpha level

alpha risk associated with area under null curve abovecutoff, e.g. 5%

sampling distribution of mean under atrue null

beta risk associated with area under alternative hyothesis curve below cutoff

sampling distributionof mean under a fixedalternative hypothesisbased on “d”

d = µa – µo /

µo µa

Fig. E.1 Power shown as the tail of the alternative hypothesis,relative to the cutoff determined by the null hypothesis distribu-tion. The diagram is most easily interpreted as a one-tailed t-test.A test against a fixed value of a mean would be done againsta population value or a chosen scale point such the midpointof a just-right scale. The value of the mean for the alternative

hypothesis can be based on research, prior knowledge, or theeffect size, d, the difference between the means under thenull and alternative hypotheses, expressed in standard deviationunits. Beta is given by the shaded area underneath the sam-pling distribution for the alternative hypothesis, below the cutoffdetermined by alpha. Power is one minus beta.

Page 68: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

540 Appendix E

Region of rejection (of null)Region of acceptance (of null)

original cutoff determined by alpha level

alpha risk increasedbeta risk reducedby change in alpha

Effect of increasing alpha levelto decrease beta risk

New cutofffrom increasedalpha level

Fig. E.2 Increasing the alpha level decreases the area associated with beta, improving power (all other variables held equal).

smaller (see Fig. E.2). As the alpha-risk is increased,the beta-risk is decreased, all other factors being heldconstant. This is shown by shifting the critical value fora significant t-statistic to the left, increasing the alpha“area,” and decreasing the area associated with beta.

A second influence comes from changing the effectsize or alternative hypothesis. If we test against a largerd-value, the distributions would be separated, and thearea of overlap is decreased. Beta-risk decreases whenwe choose a bigger effect size for the alternativehypothesis (see Fig. E.3). Conversely, testing for asmall difference in the alternative hypothesis wouldpull the two distributions closer together, and if alphais maintained at 5%, the beta-risk associated with theshaded area would have to get larger. The chances ofmissing a true difference are very high if the alternativehypothesis states that the difference is very small. It iseasier to detect a bigger difference than a smaller one,all other things in the experiment being equal.

The third effect comes from changing the sam-ple size or the number of observations. The effect ofincreasing “N” is to shrink the effective standard devia-tion of the sampling distributions, decreasing the stan-dard error of the mean. This makes the distributions

taller and thinner so there is less overlap and less areaassociated with beta. The t-value for the cutoff movesto the left in absolute terms.

In summary, we have four interacting variables andknowing any three, we can determine the fourth. Theseare alpha, beta, “N,” and effect size. If we wish to spec-ify the power of the test up front, we have to makeat least two other decisions and then the remainingparameter will be determined for us. For example, ifwe want 80% test power (beta = 0.20), and alpha equalto 0.05, and we can test only 50 subjects, then theeffect size we are able to detect at this level of poweris fixed. If we desire 80% test power, want to detect0.5 standard deviations of difference, and set alpha at0.05, then we can calculate the number of paneliststhat must be tested (i.e., “N” has been determined bythe specification of the other three variables). In manycases, experiments are conducted only with initial con-cern for alpha and sample size. In that case there isa monotonic relationship between the other two vari-ables that can be viewed after the experiment to tell uswhat power can be expected for different effect sizes.These relationships are illustrated below. Various free-ware programs are available for estimating power and

Page 69: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix E 541

Region of rejection (of null)Region of acceptance (of null)

alpha risk maintained at 5%beta risk reducedby increase in effect size to be detected

Effect of increasing alternativehypothesis effect size (“d”)to decrease beta risk

“d” - increased

Fig. E.3 Increasing the effect size that must be detected increases the power, reducing beta. Larger effects (larger d, differencebetween the means of the alternative and null hypotheses) are easier to detect.

sample size (e.g., Erdfelder et al., 1996). Tables for thepower of various statistical tests can also be found inCohen (1988). The R library “pwr” package specif-ically implements power analyses outlined in Cohen(1988).

E.3 Worked Examples

E.3.1 The t-Test

For a specific illustration, let us examine the indepen-dent groups t-test to look at the relationship betweenalpha, beta, effect size, and “N.” In this situation, wewant to compare two means generated from indepen-dent groups, and the alternative hypothesis predictsthat the means are not equal (i.e., no direction is pre-dicted). Figure E.4 shows the power of the two-tailedindependent groups t-test as a function of differentsample sizes (N) and different alternative hypothesiseffect sizes (d). (Note that N here refers to the totalsample, not N for each group. For very different sam-ple sizes per group, further calculations must be done.)If we set the lower limit of acceptable power at 50%,

we can see from these curves that using 200 panelistswould allow us to detect a small difference of about0.3 standard deviations. With 100 subjects this differ-ence must be about 0.4 standard deviations, and forsmall sensory tests of 50 or 20 panelists (25 or 10per group, respectively) we can only detect differencesof about 0.6 or 0.95 standard deviations, respectively,with 50/50 chance of missing a true difference. Thisindicates the liabilities in using a small sensory test tojustify a “parity” decision about products.

Often, a sensory scientist wants to know therequired sample size for a test, so they can recruit theappropriate number of consumers or panelists for astudy. Figure E.5 shows the sample size required fordifferent experiments for a between-groups t-test anda decision that is two tailed. An example of such adesign would be a consumer test for product accept-ability, with scaled data and each of the productsplaced with a different consumer group (a so-calledmonadic design). Note that the scale is log trans-formed, since the group size becomes very large if weare looking for small effects. For a very small effectof only 0.2 standard deviations, we need 388 con-sumers to have a minimal power level of 0.5. If wewant to increase power to 90%, the number exceeds

Page 70: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

542 Appendix E

Fig. E.4 Power of thetwo-tailed independent groupst-test as a function of differentsample sizes (N) and differentalternative hypothesis effectsizes (d); the decision istwo-tailed at alpha = 0.05.The effect size “d” representsthe difference between themeans in standard deviations.Computed from theGPOWER program ofErdfelder et al. (1996).

Fig. E.5 Number of judgesrequired for independentgroups t-test at different levelsof power; the decision is twotailed at alpha = 0.05. Notethat the sample size is plottedon a log scale. Computedfrom the GPOWER programof Erdfelder et al. (1996).

1,000. On the other hand, for a big difference of 0.8standard deviations (about 1 scale point on the 9-pointhedonic scale) we only need 28 consumers for 50%power and 68 consumers for 90% power. This illus-trates why some sensory tests done for research andproduct development purposes are smaller than the cor-responding marketing research tests. Market researchtests may be aimed at finding small advantages in amature or optimized product system, and this requiresa test of high power to keep both alpha- and beta-riskslow.

E.3.2 An Equivalence Issue with ScaledData

Gacula (1991, 1993) gives examples of calculations oftest power using several scenarios devoted to substanti-ating claims of product equivalence. These are mostlybased on larger scale consumer tests, where the sam-ple size justifies the use of the normal distribution (Z)rather than the small sample t-test. In such an experi-ment, the calculation of power is straightforward, once

Page 71: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix E 543

the mean difference associated with the alternativehypothesis is stated. The calculation for power followsthis relationship:

Power = 1 − β = 1 − �

[Xc − μD

SE

](E.2)

where Xc represents the cutoff value for a significantlyhigher mean score, determined by the alpha level. Fora one-tailed test, the cutoff is equal to the mean plus1.645 times the standard error (or 1.96 standard errorsfor a two-tailed situation). The Greek letter � repre-sents the value of the cumulative normal distribution;in other words we are converting the Z-score to aproportion or probability value. Since many tables ofthe cumulative normal distribution are given in thelarger proportion, rather than the tail (as is true inGacula’s tables), it is sometimes necessary to subtractthe tabled value from 1 to get the other tail. The param-eter μD represents the mean difference as determinedby the alternative hypothesis. This equation simplyfinds the area underneath the alternative hypothesisZ-distribution, beyond the cutoff value Xc. A diagramof this is shown below.

Here is a scenario similar to one from Gacula(1991). A consumer group of 92 panelists evaluatestwo products and gives them mean scores of 5.9 and6.1 on a 9-point hedonic scale. This is not a significantdifference, and the sensory professional is temptedto conclude that the products are equivalent. Is thisconclusion justified?

The standard deviation for this study was 1.1, giv-ing a standard error of 0.11. The cutoff values for the95% confidence interval are then 1.96 standard errors,or the mean plus or minus 0.22. We see that the twomeans lie within the 95% confidence interval so thestatistical conclusion of no difference seems to be jus-tified. A two-tailed test is used to see whether the newproduct is higher than the standard product receivinga 5.9. The two-tailed test requires a cutoff that is 1.96standard errors above, or 0.22 units above the mean.This sets our upper cutoff value for Xc at 5.9 + 0.22 or6.12. Once this boundary has been determined, it canbe used to split the distributions expected on the basisof the alternative hypotheses into two sections. This isshown in Fig. E.6. The section of the distribution that ishigher than this cutoff represents the detection of a dif-ference or power (null rejected) while the section that

is lower represents the chance of missing the differenceor beta (null accepted).

In this example, Gacula originally used the actualmean difference of 0.20 as the alternative hypothesis.This would place the alternative hypothesis mean at5.9 + 0.2 or 6.1. To estimate beta, we need to knowthe area in the tail of the alternative hypothesis distri-bution to the left of the cutoff. This can be found oncewe know the distance of the cutoff from out alternativemean of 6.1. In this example, there is a small differ-ence from the cutoff of only 6.1–6.12 or 0.02 unitson the original scale, or 0.02 divided by the standarderror to give about 0.2 Z-score units from the mean ofthe alternative to the cutoff. Essentially, this mean liesvery near to the cutoff and we have split the alterna-tive sampling distribution about in half. The area in thetail associated with beta is large, about 0.57, so poweris about 43% (1 minus beta). Thus the conclusion ofno difference is not strongly supported by the powerunder the assumptions that the true mean lies so closeto 5.9. However, we have tested against a small differ-ence as the basis for our alternative hypothesis. Thereis still a good chance that such a small difference doesexist.

Suppose we relax the alternative hypothesis. Letus presume that we determined before the experimentthat a difference of one-half of one standard devia-tion on our scale is the lower limit for any practicalimportance. We could then set the mean for the alter-native hypothesis at 5.9 plus one half of the standarddeviation (1.1/2 or 0.55). The mean for the alterna-tive now becomes 5.9 + 0.55 or 6.45. Our cutoff isnow 6.12–6.45 units away (0.33) or 0.33 divided bythe standard error of 0.11 to convert to Z-score units,giving a value of 3. This has effectively shifted theexpected distribution to the right while our decisioncutoff remains the same at 6.12. The area in the tailassociated with beta would now be less than 1% andpower would be about 99%. The choice of an alter-native hypothesis can greatly affect the confidence ofour decisions. If the business decision justifies a choiceof one half of a scale unit as a practical cutoff (basedon one-half of one standard deviation) then we cansee that our difference of only 0.2 units between meanscores is fairly “safe” when concluding no difference.The power calculations tell us exactly how safe thiswould be. There is only a very small chance of see-ing this result or one more extreme, if our true meanscore was 0.55 units higher. The observed events are

Page 72: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

544 Appendix E

Region of rejection (of null)

If product scores higher than 6.12, reject the null.

Region of acceptance (of null)

Cutoff for decision determined by alpha level = 6.12

alpha risk at 5%

5.9

6.12

Sampling distribution of the mean under the true null.

Alternative hypothesis sampling distributionbased on actual mean obtain for comparison product = 6.1 or 0.2 units of difference.

Power = 43%(null rejected under true alternative)

beta = .57(null accepted under true alternative)

Alternative hypothesis relaxed to test against a true comparison product mean of 6.45 or larger.

Power = 99%(null rejected under true alternative)

beta = .01(null accepted under true alternative)

6.45

6.1

Fig. E.6 Power first dependsupon setting a cutoff valuebased upon the sample mean,standard error, and the alphalevel. In the example shown,this value is 6.12. The cutoffvalue can then be used todetermine power and beta-riskfor various expecteddistributions of means underalternative hypotheses. InGacula’s first example, theactual second product mean of6.1 was used. The powercalculation gives only 43%,which does not provide agreat deal of confidence in aconclusion about productequivalence. The lowerexample shows the power fortesting against an alternativehypothesis that states that thetrue mean is 6.45 or higher.Our sample and experimentwould detect this largerdifference with greater power.

fairly unlikely given this alternative, so we reject thealternative in favor of the null.

E.3.3 Sample Size for a Difference Test

Amerine et al. (1965) gave a useful general formula forcomputing the necessary numbers of judges in a dis-crimination test, based on beta-risk, alpha-risk, and thecritical difference that must be detected. This last itemis conceived of as the difference between the chanceprobability, po and the probability associated with analternative hypothesis, pa. Different models for this arediscussed in Chapter 5 [see also Schlich (1993) andEnnis (1993)]. For the sake of example, we will take

the chance-adjusted probability for 50% correct, whichis halfway between the chance probability and 100%detection (i.e., 66.7% for the triangle test).

N =[

√po(1 − po) + Zβ

√pa(1 − pa)

|po − pa|]2

(E.3)

for a one-tailed test (at a = 0.05), Zα = 1.645, and ifbeta is kept to 10% (90% power) then Zβ = 1.28. Thecritical difference, po–pa, has a strong influence on theequation. In the case where it is set to 33.3% for thetriangle test (a threshold of sorts) we then require 18respondents as shown in the following calculation:

N =[

1.645√

0.333(.667) + 1.28√

0.667(0.333)

|0.333 − 0.667|]2

= 17.03

Page 73: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix E 545

So you would need a panel of about 18 persons toprotect against missing a difference this big and limityour risk to 10%. Note that this is a fairly gross test,as the difference we are trying to detect is large. Ifhalf of a consumer population notices a difference, theproduct could be in trouble.

Now, suppose we do not wish to be this lenient, butprefer a test that will be sure to catch a difference abouthalf this big at the 95% power level instead of 90%.Let us change one variable at a time to see which hasa bigger effect. First the beta-risk goes from 90 to 95%and if we remain one tailed, Zβ now equals 1.645. Sothe numbers become

N =[

1.645√

0.333(0.667) + 1.645√

0.667(0.333)

|0.333 − 0.667|]2

= 21.55

So with the increase only in power, we needfour additional people for our panel. However, if wedecrease the effect size we want to detect by half thenumbers become

N =[

1.645√

0.333(0.667) + 1.28√

0.667(0.333)

|0.167|]2

= 68.14

Now the required panel size has quadrupled. Thecombined effect of changing both beta and testing fora smaller effect is

N =[

1.645√

0.333(0.667) + 1.645√

0.667(0.333)

|0.167|]2

= 86.20

Note that in this example, the effect of halving theeffect size (critical difference) was greater than theeffect of halving the beta-risk. Choosing a reasonablealternative hypothesis is a decision that deserves somecare. If the goal is to insure that almost no person willsee a difference, or only a small proportion of con-sumers (or only a small proportion of the time) a largetest may be necessary to have confidence in a “no-difference” conclusion. A panel size of 87 testers isprobably a larger panel size than many people wouldconsider for a triangle test. Yet it is not unreasonableto have this extra size in the panel when the researchquestion and important consequences concern a parityor equivalence decision. Similar “large” sample sizescan be found in the test for similarity as outlined byMeilgaard et al. (1991).

E.4 Power in Simple Differenceand Preference Tests

The scenarios in which we test for the existence ofa difference or the existence of a preference ofteninvolve important conclusions when no significanteffect is found. These are testing situations whereacceptance of the null and therefore establishing thepower of the test are of great importance. Perhaps forthis reason, power and beta-risk in these situations havebeen addressed by several theorists. The differencetesting approaches of Schlich and Ennis are discussedbelow and a general approach to statistical power isshown in the introductory text by Welkowitz et al.(1982).

Schlich (1993) published risk tables for discrim-ination tests and offered a SAS routine to calculatebeta-risk based on exact binomial probabilities. Hisarticle also contains tables of alpha-risk and beta-riskfor small discrimination tests and minimum numbersof testers, and correct responses associated with dif-ferent levels of alpha and beta. Separate tables arecomputed for the triangle test and for the duo–triotest. The duo–trio table is also used for the direc-tional paired comparison as the tests are both one tailedwith chance probability of one-half. The tables show-ing minimum numbers of testers and correct responsesfor different levels of beta and alpha in the triangle testare abridged and shown for the triangle test and for theduo–trio test as Tables N1 and N2.

The effect size parameter is stated as the chance-adjusted percent correct. This is based on Abbott’sformula, where the chance adjusted proportion, pd,. isbased on the difference between the alternative hypoth-esis percent correct, pa, and the chance percent correct,po, by the following formula:

pd = pa − po

1 − po(E.4)

This is the so-called discriminator or guessingmodel discussed in Chapter 5. Schlich suggests thefollowing guidelines for effect size, that 50% abovechance is a large effect (50% discriminators), 37.5%above chance is a medium effect, and 25% abovechance is a small effect.

Schlich also gave some examples of useful scenar-ios in which the interplay of alpha and effect size aredriven by competing business interests. For example,

Page 74: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

546 Appendix E

manufacturing might wish to make a cost reductionby changing an ingredient, but if a spurious differenceis found they will not recommend the switch and willnot save any money. Therefore the manufacturing deci-sion is to keep alpha low. A marketing manager, onthe other hand, might want to insure that very few ifany consumers can see a difference. Thus they wishto test against a small effect size, or be sure to detecteven a small number of discriminators. Keeping thetest power higher (beta low) under both of these condi-tions will drive the required sample size (N) to a veryhigh level, perhaps hundreds of subjects, as seen in theexamples below.

Schlich’s tables provide a crossover point for a sit-uation in which both alpha and beta will be low givena sufficient number of testers and a certain effect size.If fewer than the tabulated number (“x”) answer cor-rectly, the chance of Type I error will increase shouldyou decide that there is a difference, but the chance ofType II error will decrease should you decide that thereis no difference. Conversely, if the number of correctjudges exceeds that in the table, the chance of finding aspurious difference will decrease should you reject thenull, but the chance of missing a true difference willincrease if you accept the null. So it is possible to usethese minimal values for a decision rule. Specifically,if the number is less than x, accept the null and therewill be lower beta-risk than that listed in the columnheading. If the number correct is greater than X, rejectthe null and alpha-risk will be lower than that listed. Itis also possible to interpolate to find other values usingvarious routines that can be found on the web.

Another set of tabulations for power in discrimina-tion tests has been given by Ennis (1993). Instead ofbasing the alternative hypothesis on the proportions ofdiscriminators, he has computed a measure of sensorydifference or effect size based on Thurstonian mod-eling. These models take into account the fact thatdifferent tests may have the same chance probabilitylevel, but some discrimination methods are more diffi-cult than others. The concept of “more difficult” showsup in the signal detection models as higher variabil-ity in the perceptual comparisons. The more difficulttest requires a bigger sensory difference to obtain thesame number of correct judges. The triangle test ismore difficult than the three-alternative forced-choicetest (3-AFC). In the 3-AFC, the panelist’s attentionis usually directed to a specific attribute rather thanchoosing on the odd sample. However, the chance

percent correct for both triangle and 3-AFC test is 1/3.The correction for guessing, being based on the chancelevel, does not take into account the difficulty of thetriangle procedure. The “difficulty” arises due to theinherent variability in judging three pairs of differencesas opposed to judging simply how strong or weak agiven attribute is.

Thurstonian or signal detection models (seeChapter 5) are an improvement over the “proportionof discriminators” model since they do account forthe difference in inherent variability. Ennis’s tables usethe Thurstonian sensory differences symbolized by thelower case Greek letter delta, δ. Delta represents thesensory difference in standard deviations. The standarddeviations are theoretical variability estimates of thesensory impressions created by the different products.The delta values have the advantage that they are com-parable across methods, unlike the percent correct orthe chance-adjusted percent correct. Table E.1 showsthe numbers of judges required for different levels ofpower (80, 90%) and different delta values in the duo–trio, triangle, 2AFC (paired comparison), and 3AFCtests. The lower numbers of judges required for the2AFC and 3AFC tests arise from their higher sensitiv-ity to differences, i.e., lower inherent variability underthe Thurstonian models.

In terms of delta values, we can see that the usualdiscrimination tests done with 25 or 50 panelists will

Table E.1 Numbers of judges required for different levels ofpower and sensory difference for paired comparison (2-AFC),duo–trio, 3-AFC, and triangle tests with alpha = 0.05

δ 2-AFC Duo–trio 3-AFC Triangle

80% power0.50 78 3092 64 27420.75 35 652 27 5761.00 20 225 15 1971.25 13 102 9 881.50 9 55 6 471.75 7 34 5 282.00 6 23 3 19

90% power0.50 108 4283 89 38100.75 48 902 39 8021.00 27 310 21 2761.25 18 141 13 1241.50 12 76 9 661.75 9 46 6 402.00 7 31 5 26

Abstracted from Ennis (1993)

Page 75: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix E 547

only detect gross differences (δ >1.5) if the triangleor duo–trio procedures are used. This fact offers somewarning that the “non-specific” tests for overall differ-ence (triangle, duo–trio) are really only gross tools thatare better suited to giving confidence when a differ-ence is detected. The AFC tests, on the other hand, likea paired comparison test where the attribute of differ-ence is specified (e.g., “pick which one is sweeter”) aresafer when a no-difference decision is the result.

Useful tables for the power of a triangle test canbe found in Chambers and Wolf (1996). A more gen-erally useful table for various simple tests was givenby Welkowitz et al. (1982) where the effect size andsample size are considered jointly to produce a powertable as a function of alpha. This produces a value wewill tabulate as the capital Greek letter delta (�, to dis-tinguish it from the lowercase delta in Ennis’s tables),while the raw effect size is given by the letter “d.” �

can be thought of as the d-value corrected for sam-ple size. The � and d-values take the forms shownin Table E.2 for simple statistical tests. Computingthese delta values, which take into account the sam-ple size, allows the referencing of power questions toone simple table, (Table E.3). In other words, all ofthese simple tests have power calculations via the sametable.

Here is worked example, using a two-tailed teston proportions (Welkowitz et al., 1982). Suppose amarketing researcher thinks that a product improve-ment will produce a preference difference of about 8%against the standard product. In other words, he expectsthat in a preference test, the split among consumers ofthis product would be something like 46% preferringthe standard product and 54% preferring the new prod-uct. He conducts a preference test with 400 people,considered by “intuition” to be a hefty sample size andfinds no difference. What is the power of the test andwhat is the certainty that he did miss a true differenceof that size?

Table E.2 Conversion of effect size (d) to delta (�) value,considering sample size

Test d-value �

One-sample t-test d = (μ1–μ2)/σ � = d√

NDependent t-test d = (μ1–μ2)/σ � = d

√N

Independent t-test d = (μ1–μ2)/σ � = d√

2N1N2N1+N2

Correlation r � = d√

N − 1Proportions po−pa√

po(1−po)� = d

√N

Table E.3 Effect size adjusted for sample size to show poweras a function of alpha

Two-tailed alpha 0.05 0.025 0.01 0.005One-tailed alpha 0.10 0.05 0.02 0.01

� Power

0.2 0.11 0.05 0.02 0.010.4 0.13 0.07 0.03 0.010.6 0.16 0.09 0.03 0.010.8 0.21 0.13 0.06 0.041.0 0.26 0.17 0.09 0.061.2 0.33 0.22 0.13 0.081.4 0.40 0.29 0.18 0.121.6 0.48 0.36 0.23 0.161.8 0.56 0.44 0.30 0.222.0 0.64 0.52 0.37 0.282.2 0.71 0.59 0.45 0.362.4 0.77 0.67 0.53 0.432.6 0.83 0.74 0.61 0.512.8 0.88 0.80 0.68 0.593.0 0.91 0.85 0.75 0.663.2 0.94 0.89 0.78 0.703.4 0.96 0.93 0.86 0.803.6 0.97 0.95 0.90 0.853.8 0.98 0.97 0.94 0.914.0 0.99 0.98 0.95 0.92

Reprinted with permission from Welkowitz et al. (1982),Table H

The d-value becomes 0.08 and the delta value is1.60. Referring to Table E.3, we find that with alpha setat the traditional 5% level, the power is 48% so thereis still a 52% chance of Type II error (missing a differ-ence) even with this “hefty” sample size. The problemin this example is that the alternative hypothesis pre-dicts a close race. If the researcher wants to distinguishadvantages that are this small, even larger tests must beconducted to be conclusive.

We can then turn the situation around and ask howmany consumers should be in the test given the smallwin that is expected, and the importance of a “no-preference” conclusion? We can use the followingrelationship for proportions:

N = 2

(�

d

)2

(E.5)

For a required power of 80% and keeping alpha atthe traditional 5% level, we find that a delta value of2.80 is required. Substituting in our example, we get

N = 2

(2.80

0.08

)2

= 2450

Page 76: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

548 Appendix E

This might not seem like a common consumertest for sensory scientists, who are more concernedwith alpha-risk, but in marketing research or politi-cal polling of close races, these larger samples aresometimes justified, as our example shows.

E.5 Summary and Conclusions

Equations for the required sample sizes for scaled dataand for discrimination tests were given by Eqs. (E.1)and (E.3), respectively. The equation for power forscaled data was given in Eq. (E.2). The correspondingequation for choice data from discrimination tests is

Power = 1 − β = 1 − �

[Zα

√po(1 − po)/N − (po − pa)√

pa(1 − pa)/N

]

(E.6)Table E.4 summarizes these formulae.A finding of “no difference” is often of impor-

tance in sensory evaluation and in support of productresearch. Many business decisions in foods and con-sumer products are made on the basis of small productchanges for cost reduction, a change of process vari-ables in manufacturing, a change of ingredients orsuppliers. Whether or not consumers will notice thechange is the inference made from sensory research.In many cases, insurance is provided by performing asensitive test under controlled conditions. This is thephilosophy of the “safety net” approach, paraphrasedas follows: “If we do not see a difference under con-trolled conditions using trained (or selected, screened,oriented, etc.) panelists, then consumers are unlikelyto notice this change under the more casual and vari-able conditions of natural observation.” This logicdepends upon the assumption that the laboratory testis in fact more sensitive to sensory differences thanthe consumer’s normal experience. Remember thatthe consumer has extended opportunities to observethe product under a variety of conditions, while thelaboratory-based sensory analysis is often limited intime, scope, and the conditions of evaluation.

As stated above, a conclusion of “no difference”based only on a failure to reject the null hypothesis isnot logically airtight. If we fail to reject the null, at leastthree possibilities arise: First, there may have been toomuch error or random noise in the experiment, so thestatistical significance was lost or swamped by largestandard deviations. It is a simple matter to do a sloppyexperiment. Second, we may not have tested a suf-ficient number of panelists. If the sample size is toosmall, we may miss statistical significance because theconfidence intervals around our observations are sim-ply too wide to rule out potentially important sensorydifferences. Third, there may truly be no difference(or no practical difference) between our products. Soa failure to reject the null hypothesis is ambiguous andit is simply not proper to conclude that two productsare sensorially equivalent simply based on a failure toreject the null. More information is needed.

One approach to this is experimental. If the sensorytest is sensitive enough to show a difference in someother condition or comparison, it is difficult to arguethat the test was simply not sensitive enough to findany difference in a similar study. Consideration of atrack record or demonstrated history of detecting dif-ferences with the test method is helpful. In a particularlaboratory and with a known panel, it is reasonableto conclude that a tool, which has often shown differ-ences in the past, is operating well and is sufficientlydiscriminative. Given the history of the sensory proce-dure under known conditions, it should be possible touse this sort of common sense approach to minimizerisk in decision making. In an ongoing sensory testingprogram for discrimination, it would be reasonable touse a panel of good size (say 50 screened testers), per-form a replicated test, and know whether the panel hadshown reliable differences in the past.

Another approach is to “bracket” the test compar-ison with known levels of difference. In other words,include extra products in the test that one wouldexpect to be different. Baseline or positive and nega-tive control comparisons can be tested and if the panelfinds significant differences between those benchmark

Table E.4 Sample size and power formulas (see text for details)

Form of data Sample size Power

Proportion or frequency N =[

√po(1−po)+Zβ

√pa(1−pa)

|po−pa|]2

1 − �[

√po(1−po)/N−(po−pa)√

pa(1−pa)/N

]Scaled or continuous N = (Zα+Zβ)2S2

(M1−M2)2 1 − �[

Xc−μDSE

]= 1 − �

[Zα(SE)−μD

SE

]

Page 77: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix E 549

products, we have evidence that the tool is working. Ifthe experimental comparison was not significant, thenthe difference is probably smaller than the sensory dif-ference in the benchmark or bracketing comparisonsthat did reach significance. Using meta-analytic com-parisons (Rosenthal, 1987), a conclusion about relativeeffect size may be mathematically tested. A relatedapproach is to turn the significance test around, as inthe test for significant similarity discussed in Chapter5. In that approach, the performance in a discrimina-tion test must be at or above chance, but significantlybelow some chosen cutoff for concluding that productsare different, practically (not statistically) speaking.

The third approach is to do a formal analysis ofthe test power. When a failure to reject the null isaccompanied by evidence that the test was of sufficientpower, reasonable scientific conclusions may be statedand business decisions can be made with reduced risk.Sensory scientists would do well to make some esti-mates of test power before conducting any experimentwhere a null result will generate important actions. It iseasy to overestimate the statistical power of a test. Onthe other hand, it is possible to design an overly sen-sitive test that finds small significant differences of nopractical import. As in other statistical areas, consider-ations of test power and sensitivity must also be basedon the larger framework of practical experience andconsumer and/or marketplace validation of the sensoryprocedure.

Finally, it should be noted that many fields haverejected the approach that sufficient test power allowsone to accept the null for purposes of equivalence. Forexample, in bioequivalence of drugs, one must demon-strate that the test drug falls within a certain range ofthe control or comparison (USFDA, 2001). This hasled to an interval testing approach to equivalence, alsodiscussed by Wellek (2003) and more specifically forsensory testing by Bi (2006), Ennis (2008), and Ennisand Ennis (2009).

References

Amerine, M. A., Pangborn, R. M. and Roessler, E. B. 1965.Principles of Sensory Evaluation of Food. New York:Academic Press.

ASTM. 2008. Standard guide for sensory claim substantia-tion. Designation E-1958-07. Annual Book of Standards,Vol. 15.08. ASTM International, West Conshohocken, PA,pp. 186–212.

Bi, J. 2006. Sensory Discrimination Tests and Measurements.Blackwell, Ames, IA.

Chambers, E. C. IV and Wolf, M. B. 1996. Sensory TestingMethods, Second Edition. ASTM Manual Series MNL 26.ASTM International, West Conshohocken, PA.

Cohen, J. 1988. Statistical Power Analysis for the BehavioralSciences, Second Edition. Lawrence Erlbaum Associates,Hillsdale, NJ.

Cohen, J. 1992. A power primer. Psychological Bulletin, 112,155–159.

Ennis, D. M. 1993. The power of sensory discrimination meth-ods. Journal of Sensory Studies, 8, 353–370.

Ennis, D. M. 2008. Tables for parity testing. Journal of SensoryStudies, 32, 80–91.

Ennis, D. M. and Ennis, J. M. 2010. Equivalence hypothesistesting. Food Quality and Preference, 21, 253–256.

Erdfelder, E., Faul, F. and Buchner, A. 1996. Gpower: A gen-eral power analysis program. Behavior Research Methods,Instrumentation and Computers, 28, 1–11.

Gacula, M. C., Jr. 1991. Claim substantiation for sensory equiva-lence and superiority. In: H. T. Lawless and B. P. Klein (eds.),Sensory Science Theory and Applications in Foods. MarcelDekker, New York, pp. 413–436.

Gacula, M. C. Jr. 1993. Design and Analysis of SensoryOptimization. Food and Nutrition, Trumbull, CT.

Gacula, M. C, Jr. and Singh, J. 1984. Statistical Methods in Foodand Consumer Research. Academic, Orlando, FL.

Lawless, H. T. 1988. Odour description and odour classificationrevisited. In: D. M. H. Thomson (ed.), Food Acceptability.Elsevier Applied Science, London, pp. 27–40.

Meilgaard, M., Civille, G. V. and Carr, B. T. 1991. SensoryEvaluation Techniques, Second Edition. CRC, Boca Raton.

Rosenthal, R. 1987. Judgment Studies: Design, Analysis andMeta-Analysis. University Press, Cambridge.

Schlich, P. 1993. Risk tables for discrimination tests. FoodQuality and Preference, 4, 141–151.

Sokal, R. R. and Rohlf, F. J. 1981. Biometry. Second Edition. W.H. Freeman, New York.

U. S. F. D. A. 2001. Guidance for Industry. StatisticalApproaches to Bioequivalence. U. S. Dept. of Healthand Human Services, Food and Drug Administration,Center for Drug Evaluation and Research (CDER).http://www.fda.gov/cder/guidance/index.htm.

Welkowitz, J., Ewen, R. B. and Cohen, J. 1982. IntroductoryStatistics for the Behavioral Sciences. Academic, New York.

Wellek, S. 2003. Testing Statistical Hypothesis of Equivalence.Chapman and Hall, CRC, Boca Raton, FL.

Page 78: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and
Page 79: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F

Statistical Tables

Contents

Table F.A Cumulative probabilities of the standardnormal distribution. Entry area 1–α

under the standard normal curvefrom −∞ to z(1–α) . . . . . . . . . . 552

Table F.B Table of critical values for thet-distribution . . . . . . . . . . . . . 553

Table F.C Table of critical values of the chi-square(χ2) distribution . . . . . . . . . . . 554

Table F.D1 Critical values of the F-distributionat α = 0.05 . . . . . . . . . . . . . . 555

Table F.D2 Critical values of the F-distributionat α = 0.01 . . . . . . . . . . . . . . 556

Table F.E Critical values of U for a one-tailedalpha at 0.025 or a two-tailed alphaat 0.05 . . . . . . . . . . . . . . . . 556

Table F.F1 Table of critical values of ρ (SpearmanRank correlation coefficient) . . . . . 557

Table F.F2 Table of critical values of r (Pearson’scorrelation coefficient) . . . . . . . . 558

Table F.G Critical values for Duncan’s multiplerange test (p, df, α = 0.05) . . . . . . 559

Table F.H1 Critical values of the triangle testfor similarity (maximum numbercorrect as a function of the number ofobservations (N), beta, and proportiondiscriminating) . . . . . . . . . . . . 560

Table F.H2 Critical values of the duo–trio andpaired comparison tests for similarity(maximum number correct as a functionof the number of observations (N), beta,and proportion discriminating) . . . . 561

Table F.I Table of probabilities for values as smallas observed values of x associated withthe binomial test (p=0.50) . . . . . . 562

Table F.J Critical values for the differences betweenrank sums (α = 0.05) . . . . . . . . . 563

Table F.K Critical values of the beta binomialdistribution . . . . . . . . . . . . . 564

Table F.L Minimum numbers of correct judgmentsto establish significance at probabilitylevels of 5 and 1% for paired differenceand duo–trio tests (one tailed,p = 1/2) and the triangle test(one tailed, p = 1/3) . . . . . . . . . . 565

Table F.M Minimum numbers of correct judgmentsto establish significance at probabilitylevels of 5 and 1% for paired preferencetest (two tailed, p = 1/2) . . . . . . . 566

Table F.N1 Minimum number of responses (n)and correct responses (x) to obtain alevel of Type I and Type II risks in thetriangle test. Pd is the chance-adjustedpercent correct or proportionof discriminators . . . . . . . . . . . 567

Table F.N2 Minimum number of responses (n)and correct responses (x) to obtain alevel of Type I and Type II risks in theduo–trio test. Pc is the chance-adjustedpercent correct or proportionof discriminators . . . . . . . . . . . 567

Table F.O1 d ′ and B (variance factor) values for theduo–trio and 2-AFC (paired comparison)difference tests . . . . . . . . . . . . 568

Table F.O2 d ′ and B (variance factor) valuesfor the triangle and 3-AFC differencetests . . . . . . . . . . . . . . . . . 569

Table F.P Random permutations of nine . . . . . 571Table F.Q Random numbers . . . . . . . . . . 572

551

Page 80: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

552 Appendix F

Table F.A Cumulative probabilities of the standard normal distribution. Entry area 1–α under the standard normal curve from –∞to z(1–α)

z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.83891 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.86211.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.88301.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Page 81: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 553

Table F.B Table of critical values for the t-distribution

Level of significance for one-tailed test0.01 0.05 0.025 0.01 0.005 0.001 0.0005

Level of significance for two-tailed testdf 0.02 0.1 0.05 0.02 0.01 0.002 0.001

1 3.078 6.314 12.706 31.821 63.656 318.289 636.5782 1.886 2.92 4.303 6.965 9.925 22.328 31.6003 1.638 2.353 3.182 4.541 5.841 10.214 12.9244 1.533 2.132 2.776 3.747 4.604 7.173 8.6105 1.476 2.015 2.571 3.365 4.032 5.894 6.8696 1.440 1.943 2.447 3.143 3.707 5.208 5.9597 1.415 1.895 2.365 2.998 3.499 4.785 5.4088 1.397 1.860 2.306 2.896 3.355 4.501 5.0419 1.383 1.833 2.262 2.821 3.250 4.297 4.781

10 1.372 1.812 2.228 2.764 3.169 4.144 4.58711 1.363 1.796 2.201 2.718 3.106 4.025 4.43712 1.356 1.782 2.179 2.681 3.055 3.930 4.31813 1.350 1.771 2.160 2.650 3.012 3.852 4.22114 1.345 1.761 2.145 2.624 2.977 3.787 4.14015 1.341 1.753 2.131 2.602 2.947 3.733 4.07316 1.337 1.746 2.120 2.583 2.921 3.686 4.01517 1.333 1.740 2.110 2.567 2.898 3.646 3.96518 1.330 1.734 2.101 2.552 2.878 3.610 3.92219 1.328 1.729 2.093 2.539 2.861 3.579 3.88320 1.325 1.725 2.086 2.528 2.845 3.552 3.85021 1.323 1.721 2.080 2.518 2.831 3.527 3.81922 1.321 1.717 2.074 2.508 2.819 3.505 3.79223 1.319 1.714 2.069 2.500 2.807 3.485 3.76824 1.318 1.711 2.064 2.492 2.797 3.467 3.74525 1.316 1.708 2.060 2.485 2.787 3.450 3.72526 1.315 1.706 2.056 2.479 2.779 3.435 3.70727 1.314 1.703 2.052 2.473 2.771 3.421 3.68928 1.313 1.701 2.048 2.467 2.763 3.408 3.67429 1.311 1.699 2.045 2.462 2.756 3.396 3.66030 1.310 1.697 2.042 2.457 2.750 3.385 3.64660 1.296 1.671 2.000 2.390 2.660 3.232 3.460

120 1.289 1.658 1.980 2.358 2.617 3.160 3.373∞ 1.282 1.645 1.960 2.326 2.576 3.091 3.291

Page 82: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

554 Appendix F

Table F.C Table of critical values of the chi-square (χ2) distribution

Alpha 0.1 0.05 0.025 0.01 0.005

df1 2.71 3.84 5.02 6.64 7.882 4.61 5.99 7.38 9.21 10.603 6.25 7.82 9.35 11.35 12.844 7.78 9.49 11.14 13.28 14.865 9.24 11.07 12.83 15.09 16.756 10.65 12.59 14.45 16.81 18.557 12.02 14.07 16.01 18.48 20.288 13.36 15.51 17.54 20.09 21.969 14.68 16.92 19.02 21.67 23.59

10 15.99 18.31 20.48 23.21 25.1911 17.28 19.68 21.92 24.73 26.7612 18.55 21.03 23.34 26.22 28.3013 19.81 22.36 24.74 27.69 29.8214 21.06 23.69 26.12 29.14 31.3215 22.31 25.00 27.49 30.58 32.8016 23.54 26.30 28.85 32.00 34.2717 24.77 27.59 30.19 33.41 35.7218 25.99 28.87 31.53 34.81 37.1619 27.20 30.14 32.85 36.19 38.5820 28.41 31.41 34.17 37.57 40.0021 29.62 32.67 35.48 38.93 41.4022 30.81 33.92 36.78 40.29 42.8023 32.01 35.17 38.08 41.64 44.1824 33.20 36.42 39.36 42.98 45.5625 34.38 37.65 40.65 44.31 46.9326 35.56 38.89 41.92 45.64 48.2927 36.74 40.11 43.20 46.96 49.6528 37.92 41.34 44.46 48.28 50.9929 39.09 42.56 45.72 49.59 52.3430 40.26 43.77 46.98 50.89 53.6740 51.81 55.76 59.34 63.69 66.7750 63.17 67.51 71.42 76.15 79.4960 74.40 79.08 83.30 88.38 91.9570 85.53 90.53 95.02 100.43 104.2280 96.58 101.88 106.63 112.33 116.3290 107.57 113.15 118.14 124.12 128.30

100 118.50 124.34 129.56 135.81 140.17

Page 83: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 555

Table F.D1 Critical values of the F-distribution at α = 0.05

df1 1 2 3 4 5 10 20 30 40 50 60 70 80 100 ∞df2

5 6.61 5.79 5.41 5.19 5.05 4.74 4.56 4.50 4.46 4.44 4.43 4.42 4.42 4.41 4.376 5.99 5.14 4.76 4.53 4.39 4.06 3.87 3.81 3.77 3.75 3.74 3.73 3.72 3.71 3.687 5.59 4.74 4.35 4.12 3.97 3.64 3.44 3.38 3.34 3.32 3.30 3.29 3.29 3.27 3.248 5.32 4.46 4.07 3.84 3.69 3.35 3.15 3.08 3.04 3.02 3.01 2.99 2.99 2.97 2.949 5.12 4.26 3.86 3.63 3.48 3.14 2.94 2.86 2.83 2.80 2.79 2.78 2.77 2.76 2.72

10 4.96 4.10 3.71 3.48 3.33 2.98 2.77 2.70 2.66 2.64 2.62 2.61 2.60 2.59 2.5511 4.84 3.98 3.59 3.36 3.20 2.85 2.65 2.57 2.53 2.51 2.49 2.48 2.47 2.46 2.4212 4.75 3.89 3.49 3.26 3.11 2.75 2.54 2.47 2.43 2.40 2.38 2.37 2.36 2.35 2.3113 4.67 3.81 3.41 3.18 3.03 2.67 2.46 2.38 2.34 2.31 2.30 2.28 2.27 2.26 2.2214 4.60 3.74 3.34 3.11 2.96 2.60 2.39 2.31 2.27 2.24 2.22 2.21 2.20 2.19 2.1415 4.54 3.68 3.29 3.06 2.90 2.54 2.33 2.25 2.20 2.18 2.16 2.15 2.14 2.12 2.0816 4.49 3.63 3.24 3.01 2.85 2.49 2.28 2.19 2.15 2.12 2.11 2.09 2.08 2.07 2.0217 4.45 3.59 3.20 2.96 2.81 2.45 2.23 2.15 2.10 2.08 2.06 2.05 2.03 2.02 1.9718 4.41 3.55 3.16 2.93 2.77 2.41 2.19 2.11 2.06 2.04 2.02 2.00 1.99 1.98 1.9319 4.38 3.52 3.13 2.90 2.74 2.38 2.16 2.07 2.03 2.00 1.98 1.97 1.96 1.94 1.8920 4.35 3.49 3.10 2.87 2.71 2.35 2.12 2.04 1.99 1.97 1.95 1.93 1.92 1.91 1.8622 4.30 3.44 3.05 2.82 2.66 2.30 2.07 1.98 1.94 1.91 1.89 1.88 1.86 1.85 1.8023 4.26 3.40 3.01 2.78 2.62 2.25 2.03 1.94 1.89 1.86 1.84 1.83 1.82 1.80 1.7526 4.23 3.37 2.98 2.74 2.59 2.22 1.99 1.90 1.85 1.82 1.80 1.79 1.78 1.76 1.7128 4.20 3.34 2.95 2.71 2.56 2.19 1.96 1.87 1.82 1.79 1.77 1.75 1.74 1.73 1.6730 4.17 3.32 2.92 2.69 2.53 2.16 1.93 1.84 1.79 1.76 1.74 1.72 1.71 1.70 1.6435 4.12 3.27 2.87 2.64 2.49 2.11 1.88 1.79 1.74 1.70 1.68 1.66 1.65 1.63 1.5740 4.08 3.23 2.84 2.61 2.45 2.08 1.84 1.74 1.69 1.66 1.64 1.62 1.61 1.59 1.5345 4.06 3.20 2.81 2.58 2.42 2.05 1.81 1.71 1.66 1.63 1.60 1.59 1.57 1.55 1.4950 4.03 3.18 2.79 2.56 2.40 2.03 1.78 1.69 1.63 1.60 1.58 1.56 1.54 1.52 1.4660 4.00 3.15 2.76 2.53 2.37 1.99 1.75 1.65 1.59 1.56 1.53 1.52 1.50 1.48 1.4170 3.98 3.13 2.74 2.50 2.35 1.97 1.72 1.62 1.57 1.53 1.50 1.49 1.47 1.45 1.3780 3.96 3.11 2.72 2.49 2.33 1.95 1.70 1.60 1.54 1.51 1.48 1.46 1.45 1.43 1.35

100 3.94 3.09 2.70 2.46 2.31 1.93 1.68 1.57 1.52 1.48 1.45 1.43 1.41 1.39 1.31∞ 3.86 3.01 2.62 2.39 2.23 1.85 1.59 1.48 1.42 1.38 1.35 1.32 1.30 1.28 1.16

Page 84: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

556 Appendix F

Table F.D2 Critical values of the F-distribution at α = 0.01

df1 1 2 3 4 5 10 20 30 40 50 60 70 80 100 ∞df2

3 34.12 30.82 29.46 28.71 28.24 27.23 26.69 26.50 26.41 26.35 26.32 26.29 26.27 26.24 26.154 21.20 18.00 16.69 15.98 15.52 14.55 14.02 13.84 13.75 13.69 13.65 13.63 13.61 13.58 13.495 16.26 13.27 12.06 11.39 10.97 10.05 9.55 9.38 9.29 9.24 9.20 9.18 9.16 9.13 9.046 13.75 10.92 9.78 9.15 8.75 7.87 7.40 7.23 7.14 7.09 7.06 7.03 7.01 6.99 6.907 12.25 9.55 8.45 7.85 7.46 6.62 6.16 5.99 5.91 5.86 5.82 5.80 5.78 5.75 5.678 11.26 8.65 7.59 7.01 6.63 5.81 5.36 5.20 5.12 5.07 5.03 5.01 4.99 4.96 4.889 10.56 8.02 6.99 6.42 6.06 5.26 4.81 4.65 4.57 4.52 4.48 4.46 4.44 4.42 4.33

10 10.04 7.56 6.55 5.99 5.64 4.85 4.41 4.25 4.17 4.12 4.08 4.06 4.04 4.01 3.9311 9.65 7.21 6.22 5.67 5.32 4.54 4.10 3.94 3.86 3.81 3.78 3.75 3.73 3.71 3.6212 9.33 6.93 5.95 5.41 5.06 4.30 3.86 3.70 3.62 3.57 3.54 3.51 3.49 3.47 3.3813 9.07 6.70 5.74 5.21 4.86 4.10 3.66 3.51 3.43 3.38 3.34 3.32 3.30 3.27 3.1914 8.86 6.51 5.56 5.04 4.70 3.94 3.51 3.35 3.27 3.22 3.18 3.16 3.14 3.11 3.0315 8.68 6.36 5.42 4.89 4.56 3.80 3.37 3.21 3.13 3.08 3.05 3.02 3.00 2.98 2.8916 8.53 6.23 5.29 4.77 4.44 3.69 3.26 3.10 3.02 2.97 2.93 2.91 2.89 2.86 2.7817 8.40 6.11 5.19 4.67 4.34 3.59 3.16 3.00 2.92 2.87 2.83 2.81 2.79 2.76 2.6818 8.29 6.01 5.09 4.58 4.25 3.51 3.08 2.92 2.84 2.78 2.75 2.72 2.71 2.68 2.5919 8.19 5.93 5.01 4.50 4.17 3.43 3.00 2.84 2.76 2.71 2.67 2.65 2.63 2.60 2.5120 8.10 5.85 4.94 4.43 4.10 3.37 2.94 2.78 2.69 2.64 2.61 2.58 2.56 2.54 2.4430 7.56 5.39 4.51 4.02 3.70 2.98 2.55 2.39 2.30 2.25 2.21 2.18 2.16 2.13 2.0340 7.31 5.18 4.31 3.83 3.51 2.80 2.37 2.20 2.11 2.06 2.02 1.99 1.97 1.94 1.8350 7.17 5.06 4.20 3.72 3.41 2.70 2.27 2.10 2.01 1.95 1.91 1.88 1.86 1.82 1.7160 7.08 4.98 4.13 3.65 3.34 2.63 2.20 2.03 1.94 1.88 1.84 1.81 1.78 1.75 1.6370 7.01 4.92 4.07 3.60 3.29 2.59 2.15 1.98 1.89 1.83 1.78 1.75 1.73 1.70 1.5780 6.96 4.88 4.04 3.56 3.26 2.55 2.12 1.94 1.85 1.79 1.75 1.71 1.69 1.65 1.53

100 6.90 4.82 3.98 3.51 3.21 2.50 2.07 1.89 1.80 1.74 1.69 1.66 1.63 1.60 1.47∞ 6.69 4.65 3.82 3.36 3.05 2.36 1.92 1.74 1.63 1.57 1.52 1.48 1.45 1.41 1.23

Table F.E Critical values of U for a one-tailed alpha at 0.025 or a two-tailed alpha at 0.05

n1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

n2

5 2 3 5 6 7 8 9 11 12 13 14 15 17 18 19 206 3 5 6 8 10 11 13 14 16 17 19 21 22 24 25 277 5 6 8 10 12 14 16 18 20 22 24 26 28 30 32 348 6 8 10 13 15 17 19 22 24 26 29 31 34 36 38 419 7 10 12 15 17 21 23 26 28 31 34 37 39 42 45 48

10 8 11 14 17 20 23 26 29 33 36 39 42 45 48 52 5511 9 13 16 19 23 26 30 33 37 40 44 47 51 55 58 6212 11 14 18 22 26 29 33 37 41 45 49 53 57 61 65 6913 12 16 20 24 28 33 37 41 45 50 54 59 63 67 72 7614 13 17 22 26 31 36 40 45 50 55 59 64 67 74 78 8315 14 19 24 29 34 39 44 49 54 59 64 70 75 80 85 9016 15 21 26 31 37 42 47 53 59 64 70 75 81 86 92 9817 17 22 28 34 39 45 51 57 63 67 75 81 87 93 99 10518 18 24 30 36 42 48 55 61 67 74 80 86 93 99 106 11219 19 25 32 38 45 52 58 65 72 78 85 92 99 106 113 11920 20 27 34 41 48 55 62 69 76 83 90 98 105 112 119 127

Reworked from Auble, D. 1953. Extended tables for the Mann–Whitney U-statistic. Bulletin of the Instituteof Educational Research, 1(2). Indiana University

Page 85: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 557

Table F.F1 Table of critical values of ρ (Spearman Rankcorrelation coefficient)

One-tailed alpha values0.05 0.025 0.01 0.005

Two-tailed alpha values0.10 0.05 0.02 0.01

n4 1.0005 0.900 1.000 1.0006 0.829 0.886 0.943 1.0007 0.714 0.786 0.893 0.9298 0.643 0.738 0.833 0.8819 0.600 0.700 0.783 0.833

10 0.564 0.648 0.745 0.79411 0.536 0.618 0.709 0.75512 0.503 0.587 0.678 0.72713 0.484 0.560 0.648 0.70314 0.464 0.538 0.626 0.67915 0.446 0.521 0.604 0.65416 0.429 0.503 0.582 0.63517 0.414 0.488 0.566 0.61818 0.401 0.472 0.550 0.60019 0.391 0.460 0.535 0.58420 0.380 0.447 0.522 0.57021 0.370 0.436 0.509 0.55622 0.361 0.425 0.497 0.54423 0.353 0.416 0.486 0.53224 0.344 0.407 0.476 0.52125 0.337 0.398 0.466 0.51126 0.331 0.390 0.457 0.50127 0.324 0.383 0.449 0.49228 0.318 0.375 0.441 0.48329 0.312 0.368 0.433 0.47530 0.306 0.362 0.425 0.46735 0.283 0.335 0.394 0.43340 0.264 0.313 0.368 0.40545 0.248 0.294 0.347 0.38250 0.235 0.279 0.329 0.363

Reworked from Ramsey, P. H. 1989. Critical values forSpearman’s rank order correlation. Journal of Educational andBehavioral Statistics, 14, 245–253

Page 86: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

558 Appendix F

Table F.F2 Table of critical values of r (Pearson’s correlationcoefficient)

One-tailed alpha values0.05 0.025 0.01 0.005

Two-tailed alpha values0.1 0.05 0.02 0.01

df (n–2)1 0.988 0.997 0.999 0.9992 0.900 0.950 0.980 0.9903 0.805 0.878 0.934 0.9594 0.729 0.811 0.882 0.9175 0.669 0.754 0.833 0.8756 0.622 0.707 0.789 0.8347 0.582 0.666 0.750 0.7988 0.549 0.632 0.716 0.7659 0.521 0.602 0.685 0.735

10 0.497 0.576 0.658 0.70811 0.476 0.553 0.634 0.68412 0.458 0.532 0.612 0.66113 0.441 0.514 0.592 0.64114 0.426 0.497 0.574 0.62315 0.412 0.482 0.558 0.60616 0.400 0.468 0.542 0.59017 0.389 0.456 0.528 0.57518 0.378 0.444 0.516 0.56119 0.369 0.433 0.503 0.54920 0.360 0.423 0.492 0.53721 0.352 0.413 0.482 0.52622 0.344 0.404 0.472 0.51523 0.337 0.396 0.462 0.50524 0.330 0.388 0.453 0.49625 0.323 0.381 0.445 0.48726 0.317 0.374 0.437 0.47927 0.311 0.367 0.43 0.47128 0.306 0.361 0.423 0.46329 0.301 0.355 0.416 0.45630 0.296 0.349 0.409 0.44935 0.275 0.325 0.381 0.41840 0.257 0.304 0.358 0.39345 0.243 0.288 0.338 0.37250 0.231 0.273 0.322 0.354

Page 87: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 559

Table F.G Critical values for Duncan’s multiple range test (p, df, α = 0.05)

Number of means bracketing comparison (p)a

2 3 4 5 10 15 20

df1 17.969 17.969 17.969 17.969 17.969 17.969 17.9692 6.085 6.085 6.085 6.085 6.085 6.085 6.0853 4.501 4.516 4.516 4.516 4.516 4.516 4.5164 3.926 4.013 4.033 4.033 4.033 4.033 4.0335 3.635 3.749 3.796 3.814 3.814 3.814 3.8146 3.461 3.586 3.649 3.68 3.697 3.697 3.6977 3.344 3.477 3.548 3.588 3.625 3.625 3.6258 3.261 3.398 3.475 3.521 3.579 3.579 3.5799 3.199 3.339 3.42 3.47 3.547 3.547 3.547

10 3.151 3.293 3.376 3.43 3.522 3.525 3.52511 3.113 3.256 3.341 3.397 3.501 3.510 3.51012 3.081 3.225 3.312 3.37 3.484 3.498 3.49813 3.055 3.200 3.288 3.348 3.470 3.49 3.49014 3.033 3.178 3.268 3.328 3.457 3.484 3.48415 3.014 3.16 3.25 3.312 3.446 3.478 3.48016 2.998 3.144 3.235 3.297 3.437 3.473 3.47717 2.984 3.130 3.222 3.285 3.429 3.469 3.47518 2.971 3.117 3.21 3.274 3.421 3.465 3.47419 2.96 3.106 3.199 3.264 3.415 3.462 3.47420 2.95 3.097 3.190 3.255 3.409 3.459 3.47322 2.933 3.080 3.173 3.239 3.398 3.453 3.47224 2.919 3.066 3.160 3.226 3.390 3.449 3.47226 2.907 3.054 3.149 3.216 3.382 3.445 3.47128 2.897 3.044 3.139 3.206 3.376 3.442 3.47030 2.888 3.035 3.131 3.199 3.371 3.439 3.47035 2.871 3.018 3.114 3.183 3.360 3.433 3.46940 2.858 3.005 3.102 3.171 3.352 3.429 3.46960 2.829 2.976 3.073 3.143 3.333 3.419 3.46880 2.814 2.961 3.059 3.130 3.323 3.414 3.467

120 2.800 2.947 3.045 3.116 3.313 3.409 3.466∞ 2.772 2.918 3.017 3.089 3.294 3.399 3.466

Reworked from Harter, H. L. 1960. Critical values for Duncan’s new multiple range test. Biometrics,16, 671–685.aNumber of means, when rank ordered, between the pair being compared and including the pair itself

Page 88: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

560 Appendix F

Table F.H1 Critical valuesa of the triangle test for similarity (maximum number correct as afunction of the number of observations (N), beta, and proportion discriminating)

Proportion discriminating

Beta 10% 20% 30%

N30 0.05 11

0.1 10 1136 0.05 11 13

0.1 10 12 1442 0.05 11 13 16

0.1 12 14 1748 0.05 13 16 19

0.1 14 17 2054 0.05 15 18 22

0.1 16 20 2360 0.05 17 21 25

0.1 18 22 2666 0.05 19 23 28

0.1 20 25 2972 0.05 21 26 30

0.1 22 27 3278 0.05 23 28 33

0.1 25 30 3484 0.05 25 31 35

0.1 27 32 3890 0.05 27 33 38

0.1 29 35 3896 0.05 30 36 42

0.1 31 38 44

Created in analogy to Meilgaard, M., Civille, G. V., Carr B. T. 1991. Sensory Evaluation Techniques.CRC Boca Raton, FL. Using B.T. Carr’s Discrimination Test Analysis Tool EXCEL programaAccept the null hypothesis with 100(1–beta) confidence if the number of correct choices does notexceed the tabled value for the allowable proportion of discriminators

Page 89: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 561

Table F.H2 Critical valuesa of the duo–trio and paired comparison tests for similarity (maximum numbercorrect as a function of the number of observations (N), beta, and proportion discriminating

Proportion discriminating

Beta 10% 20% 30%

N32 0.05 12 14 15

0.1 13 15 1636 0.05 14 16 18

0.1 15 17 1940 0.05 16 18 20

0.1 17 19 2144 0.05 18 20 22

0.1 19 21 2448 0.05 20 22 25

0.1 21 23 2652 0.05 22 24 27

0.1 23 26 2856 0.05 24 27 29

0.1 25 28 3160 0.05 26 29 32

0.1 27 30 3364 0.05 28 31 34

0.1 29 32 3668 0.05 30 33 37

0.1 31 35 3872 0.05 32 35 39

0.1 33 37 4176 0.05 34 38 41

0.1 35 39 4380 0.05 36 40 44

0.1 37 41 4684 0.05 38 42 46

0.1 39 44 48

Created in analogy to Meilgaard, M., Civille, G. V., Carr B. T. 1991. Sensory Evaluation Techniques. CRCBoca Raton, FL. Using B.T. Carr’s Discrimination Test Analysis Tool EXCEL programaAccept the null hypothesis with 100(1–beta) confidence if the number of correct choices does not exceed thetabled value for the allowable proportion of discriminators

Page 90: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

562 Appendix F

Table F.I Table of probabilities for values as small as observed values of x associated with the binomial test (p=0.50)a,b

x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

N5 0.031 0.188 0.500 0.813 0.9696 0.016 0.109 0.344 0.656 0.891 0.9847 0.008 0.063 0.227 0.500 0.773 0.938 0.9928 0.004 0.035 0.145 0.363 0.637 0.855 0.965 0.9969 0.002 0.020 0.090 0.254 0.500 0.746 0.910 0.980 0.998

10 0.001 0.011 0.055 0.172 0.377 0.623 0.828 0.945 0.989 0.99911 0.000 0.006 0.033 0.113 0.274 0.500 0.726 0.887 0.967 0.99412 0.000 0.003 0.019 0.073 0.194 0.387 0.613 0.806 0.927 0.981 0.99713 0.000 0.002 0.011 0.046 0.133 0.291 0.500 0.709 0.867 0.954 0.989 0.99814 0.001 0.006 0.029 0.090 0.212 0.395 0.605 0.788 0.910 0.971 0.994 0.99915 0.000 0.004 0.018 0.059 0.151 0.304 0.500 0.696 0.849 0.941 0.982 0.99616 0.000 0.002 0.011 0.038 0.105 0.227 0.402 0.598 0.773 0.895 0.962 0.989 0.99817 0.000 0.001 0.006 0.025 0.072 0.166 0.315 0.500 0.685 0.834 0.928 0.975 0.994 0.99918 0.001 0.004 0.015 0.048 0.119 0.240 0.407 0.593 0.760 0.881 0.952 0.985 0.996 0.99919 0.000 0.002 0.010 0.032 0.084 0.180 0.324 0.500 0.676 0.820 0.916 0.968 0.990 0.99820 0.000 0.001 0.006 0.021 0.058 0.132 0.252 0.412 0.588 0.748 0.868 0.942 0.979 0.99421 0.000 0.001 0.004 0.013 0.039 0.095 0.192 0.332 0.500 0.668 0.808 0.905 0.961 0.98722 0.000 0.002 0.008 0.026 0.067 0.143 0.262 0.416 0.584 0.738 0.857 0.933 0.97423 0.000 0.001 0.005 0.017 0.047 0.105 0.202 0.339 0.500 0.661 0.798 0.895 0.95324 0.000 0.001 0.003 0.011 0.032 0.076 0.154 0.271 0.419 0.581 0.729 0.846 0.92425 0.000 0.002 0.007 0.022 0.054 0.115 0.212 0.345 0.500 0.655 0.788 0.88526 0.000 0.001 0.005 0.014 0.038 0.084 0.163 0.279 0.423 0.577 0.721 0.83727 0.000 0.001 0.003 0.010 0.026 0.061 0.124 0.221 0.351 0.500 0.649 0.77928 0.000 0.002 0.006 0.018 0.044 0.092 0.172 0.286 0.425 0.575 0.71429 0.000 0.001 0.004 0.012 0.031 0.068 0.132 0.229 0.356 0.500 0.64430 0.000 0.001 0.003 0.008 0.021 0.049 0.100 0.181 0.292 0.428 0.57235 0.000 0.001 0.003 0.008 0.020 0.045 0.088 0.155 0.25040 0.000 0.001 0.003 0.008 0.019 0.040 0.077

aThese values are one tailed. For a two-tailed test double the value.bThe alpha level is equal to (1–probability)

Page 91: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 563

Table F.J Critical values for the differences between rank sums (α = 0.05)

Number of samples

3 4 5 6 7 8 9 10 11 12

Number of panelists3 6 8 11 13 15 18 20 23 25 284 7 10 13 15 18 21 24 27 30 335 8 11 14 17 21 24 27 30 34 376 9 12 15 19 22 26 30 34 37 427 10 13 17 20 24 28 32 36 40 448 10 14 18 22 26 30 34 39 43 479 10 15 19 23 27 32 36 41 46 50

10 1 15 20 24 29 34 38 43 48 5311 11 16 21 26 30 35 40 45 51 5612 12 17 22 27 32 37 42 48 53 5813 12 18 23 28 33 39 44 50 55 6114 13 18 24 29 34 40 46 52 57 6315 13 19 24 30 36 42 47 53 59 6616 14 19 25 31 37 42 49 55 61 6717 14 20 26 32 38 44 50 56 63 6918 15 20 26 32 39 45 51 58 65 7119 15 21 27 33 40 46 53 60 66 7320 15 21 28 34 41 47 54 61 68 7521 16 22 28 35 42 49 56 63 70 7722 16 22 29 36 43 50 57 64 71 7923 16 23 30 37 44 51 58 65 73 8024 17 23 30 37 45 52 59 67 74 8225 17 24 31 38 46 53 61 68 76 8426 17 24 32 39 46 54 62 70 77 8527 18 25 32 40 47 55 63 71 79 8728 18 25 33 40 48 56 64 72 80 8929 18 26 33 41 49 57 65 73 82 9030 19 26 34 42 50 58 66 75 83 9235 20 28 37 45 54 63 72 81 90 9940 21 30 39 48 57 67 76 86 96 10645 23 32 41 51 61 71 81 91 102 11250 24 34 44 54 64 75 85 96 107 11855 25 34 46 56 67 78 90 101 112 12460 26 37 48 59 70 82 94 105 117 13065 27 38 50 61 73 85 97 110 122 13570 28 40 52 64 76 88 101 114 127 14075 29 41 53 66 79 91 105 118 131 14580 30 42 55 68 81 94 108 122 136 15085 31 44 57 70 84 97 111 125 140 15490 32 45 58 72 86 100 114 129 144 15995 33 46 60 74 88 103 118 133 148 163

100 34 47 61 76 91 105 121 136 151 167

Reworked from Newell, G. and MacFarlane, J. 1988. Expanded tables for multiple comparison proceduresin the analysis of ranked data. Journal of Food Science, 52, 1721–1725

Page 92: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

564 Appendix F

Table F.K Critical valuesa of the beta binomial distribution

Gamma

0 0.1 0.2 0.3 0.4 0.5 0.6 0.8

p = 1/3, one sidedb

N20 19 19 19 19 19 19 19 2025 22 23 23 23 23 24 24 2430 26 27 27 27 27 28 28 2835 30 30 31 31 31 32 32 3240 34 34 34 34 35 35 36 3645 38 38 38 39 39 39 39 4050 41 42 42 42 43 43 43 4455 45 45 46 46 46 47 47 4860 49 49 49 50 50 50 51 5170 56 56 57 57 58 58 58 5980 63 64 64 64 65 65 66 6690 70 71 71 72 72 72 73 74

100 77 78 79 79 79 80 80 81125 95 96 96 97 97 98 98 99150 113 114 114 115 115 116 116 117200 148 149 149 150 151 151 152 153p = 1/2, one sidedb

20 26 26 26 26 27 27 27 2725 31 32 32 32 32 33 33 3330 37 37 37 38 38 38 39 3935 42 43 43 43 44 44 44 4540 48 48 49 49 49 50 50 5045 53 54 54 54 55 55 55 5650 59 59 60 60 60 61 61 6155 64 65 65 65 66 66 66 6760 70 70 70 71 71 72 72 7370 80 81 81 82 82 82 83 8480 91 91 92 92 93 93 94 9490 101 102 103 103 104 104 104 105

100 112 113 113 114 114 115 115 116125 138 139 140 140 141 141 142 143150 165 165 166 167 167 168 169 170200 217 218 218 219 220 221 221 223p = 1/2, two sidedc

20 27 27 27 28 28 28 28 2925 32 33 33 33 34 34 34 3530 38 38 39 39 39 40 40 4135 44 44 44 45 45 46 46 4640 49 50 50 50 51 51 52 5245 55 55 56 56 56 57 57 5850 60 61 61 62 62 62 63 6455 66 66 67 67 68 68 68 6960 71 72 72 73 73 74 74 7570 82 83 83 84 84 85 85 8680 93 93 94 95 95 96 96 9790 104 104 105 105 106 107 107 108

100 114 115 116 116 117 118 118 119125 141 142 142 143 144 144 145 146150 167 168 169 170 171 171 172 173200 220 221 222 223 224 224 225 227

aValues are rounded up to 1 except where the exact value was less than 0.05 higher than the integerbWhen used for discrimination tests, the total number of correct choices must equal or exceed the tabled valuecFor this test, when used for preference tests, the total number of preference choices for the larger proportion (more preferreditem) must equal or exceed the tabled value

Page 93: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 565

Table F.L Minimum numbers of correct judgmentsa to establish significance at probability levels of 5 and 1% for paireddifference and duo–trio tests (one tailed, p = 1/2) and the triangle test (one tailed, p = 1/3)

Paired difference and duo–trio tests Triangle test

Probability levels Probability levels

Number of trials (n) 0.05 0.01 Number of trails (n) 0.05 0.01

7 7 7 5 4 58 7 8 6 5 69 8 9 7 5 6

10 9 10 8 6 711 9 10 9 6 712 10 11 10 7 813 10 12 11 7 814 11 12 12 8 915 12 13 13 8 916 12 14 14 9 1017 13 14 15 9 1018 13 15 16 9 1119 14 15 17 10 1120 15 16 18 10 1221 15 17 19 11 1222 16 17 20 11 1323 16 18 21 12 1324 17 19 22 12 1425 18 19 23 12 1426 18 20 24 13 1527 19 20 25 13 1528 19 21 26 14 1529 20 22 27 14 1630 20 22 28 15 1631 21 23 29 15 1732 22 24 30 15 1733 22 24 31 16 1834 23 25 32 16 1835 23 25 33 17 1836 24 26 34 17 1937 24 26 35 17 1938 25 27 36 18 2039 26 28 37 18 2040 26 28 38 19 2141 27 29 39 19 2142 27 29 40 19 2143 28 30 41 20 2244 28 31 42 20 2245 29 31 43 20 2346 30 32 44 21 2347 30 32 45 21 2448 31 33 46 22 2449 31 34 47 22 2450 32 34 48 22 2560 37 40 49 23 2570 43 46 50 23 2680 48 51 60 27 3090 54 57 70 31 34

100 59 63 80 35 3890 38 42

100 42 45aCreated in EXCEL 2007 using B. T. Carr’s Discrimination Test Analysis Tool EXCEL program (used with permission)

Page 94: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

566 Appendix F

Table F.M Minimum numbers of correct judgmentsa to establish significance at probability levelsof 5 and 1% for paired preference test (two tailed, p = 1/2)

Trials (n) 0.05 0.01 Trails (n) 0.05 0.01

7 7 7 45 30 328 8 8 46 31 339 8 9 47 31 33

10 9 10 48 32 3411 10 11 49 32 3412 10 11 50 33 3513 11 12 60 39 4114 12 13 70 44 4715 12 13 80 50 5216 13 14 90 55 5817 13 15 100 61 6418 14 15 110 66 6919 15 16 120 72 7520 15 17 130 77 8121 16 17 140 83 8622 17 18 150 88 9223 17 19 160 93 9724 18 19 170 99 10325 18 20 180 104 10826 19 20 190 109 11427 20 21 200 115 11928 22 22 250 141 14629 21 22 300 168 17330 21 23 350 194 20031 22 24 400 221 22732 23 24 450 247 25333 23 25 500 273 28034 24 25 550 299 30635 24 26 600 325 33236 25 27 650 351 35937 25 27 700 377 38538 26 28 750 403 41139 27 28 800 429 43740 27 29 850 455 46341 28 30 900 480 49042 28 30 950 506 51643 29 31 1, 000 532 54244 29 31aCreated in EXCEL 2007 using B. T. Carr’s Discrimination Test Analysis Tool EXCEL program(used with permission)

Page 95: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 567

Table F.N1 Minimum number of responses (n) and correct responses (x) to obtain a level of Type I andType II risks in the triangle test. Pd is the chance-adjusted percent correct or proportion of discriminators

Type II risk

Type I risk 0.20 0.10 0.05N X N X N X

Pd = 0.500.10 12 7 15 8 20 100.05 16 9 20 11 23 120.01 25 15 30 17 35 19Pd = 0.400.10 17 9 25 12 39 140.05 23 12 30 15 40 190.01 35 19 47 24 56 28Pd = 0.300.10 30 14 43 19 54 230.05 40 19 53 24 66 290.01 62 30 82 38 97 44Pd = 0.200.10 62 26 89 36 119 470.05 87 37 117 48 147 590.01 136 59 176 74 211 87

Abstracted from Schlich, P. 1993. Risk tables for discrimination tests. Food Quality and Preference, 4,141–151.

Table F.N2 Minimum number of responses (n) and correct responses (x) to obtain a level of Type I andType II risks in the duo–trio test. Pc is the chance-adjusted percent correct or proportion of discriminators

Type II risk

Type I risk 0.20 0.10 0.05N X N X N X

Pd = 0.500.10 19 13 26 17 33 210.05 23 16 33 22 42 270.01 40 28 50 34 59 39Pd = 0.400.10 28 18 39 24 53 320.05 37 24 53 33 67 410.01 64 42 80 51 96 60Pd = 0.300.10 53 32 72 42 96 550.05 69 42 93 55 119 690.01 112 69 143 86 174 103Pd = 0.200.10 115 65 168 93 214 1170.05 158 90 213 119 268 1480.01 252 145 325 184 391 219

Abstracted from Schlich, P. 1993. Risk tables for discrimination tests. Food Quality and Preference, 4,141–151

Page 96: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

568 Appendix F

Table F.O1 d ′ and B (variance factor) values for the duo–trioand 2-AFC (paired comparison) difference tests

Duo–trio 2-AFC

PC d ′ B d ′ B

0.51 0.312 70.53 0.036 3.140.52 0.472 36.57 0.071 3.150.53 0.582 25.28 0.107 3.150.54 0.677 19.66 0.142 3.150.55 0.761 16.32 0.178 3.160.56 0.840 14.11 0.214 3.170.57 0.913 12.55 0.250 3.170.58 0.983 11.40 0.286 3.180.59 1.050 10.52 0.322 3.200.60 1.115 9.83 0.358 3.220.61 1.178 9.29 0.395 3.230.62 1.240 8.85 0.432 3.250.63 1.301 8.49 0.469 3.270.64 1.361 8.21 0.507 3.290.65 1.421 7.97 0.545 3.320.66 1.480 7.79 0.583 3.340.67 1.569 7.64 0.622 3.370.68 1.597 7.53 0.661 3.400.69 1.565 7.45 0.701 3.430.70 1.715 7.39 0.742 3.470.71 1.775 7.36 0.783 3.510.72 1.835 7.36 0.824 3.560.73 1.896 7.38 0.867 3.610.74 1.957 7.42 0.910 3.660.75 2.020 7.49 0.954 3.710.76 2.084 7.58 0.999 3.770.77 2.149 7.70 1.045 3.840.78 2.216 7.84 1.092 3.910.79 2.284 8.01 1.141 3.990.80 2.355 8.21 1.190 4.080.81 2.428 8.45 1.242 4.180.82 2.503 8.73 1.295 4.290.83 2.582 9.05 1.349 4.410.84 2.664 9.42 1.406 4.540.85 2.749 9.86 1.466 4.690.86 2.840 10.36 1.528 4.860.87 2.935 10.96 1.593 5.050.88 3.037 11.65 1.662 5.280.89 3.146 12.48 1.735 5.540.90 3.263 13.47 1.812 5.840.91 3.390 14.67 1.896 6.210.92 3.530 16.16 1.987 6.660.93 3.689 18.02 2.087 7.220.94 3.867 20.45 2.199 7.95

Page 97: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 569

Table F.O1 (continued)

Duo–trio 2-AFC

PC d ′ B d ′ B

0.95 4.072 23.71 2.326 8.930.96 4.318 28.34 82.476 10.340.97 3.625 35.52 2.660 12.570.98 5.040 48.59 2.900 16.720.99 5.701 82.78 3.290 27.88

B-factors are used to compute variance of the d′ values, whereVar(d′)= B/N, where N is the sample sizeReprinted with permission from “Tables for Sensory Methods,The Institute for Perception, February, 2002”

Table F.O2 d′ and B (variance factor) values for the triangleand 3-AFC difference tests

Triangle 3-AFC

PC d′ B d′ B

0.34 0.270 93.24 0.024 2.780.35 0.429 38.88 0.059 2.760.36 0.545 25.31 0.093 2.740.37 0.643 19.17 0.128 2.720.38 0.728 15.67 0.162 2.710.39 0.807 13.42 0.195 2.690.40 0.879 11.86 0.229 2.680.41 0.948 10.71 0.262 2.670.42 1.013 9.85 0.295 2.660.43 1.075 9.17 0.328 2.650.44 1.135 8.62 0.361 2.650.45 1.193 8.18 0.394 2.640.46 1.250 7.82 0.427 2.640.47 1.306 7.52 0.459 2.640.48 1.360 7.27 0.492 2.630.49 1.414 7.06 0.524 2.630.50 1.466 6.88 0.557 2.640.51 1.518 6.73 0.589 2.640.52 1.570 6.60 0.622 2.640.53 1.621 6.50 0.654 2.650.54 1.672 6.41 0.687 2.650.55 1.723 6.34 0.719 2.660.56 1.774 6.28 0.752 2.670.57 1.824 6.24 0.785 2.680.58 1.874 6.21 0.818 2.690.59 1.925 6.19 0.852 2.700.60 1.976 6.18 0.885 2.710.61 2.027 6.18 0.919 2.730.62 2.078 6.19 0.953 2.750.63 2.129 6.21 0.987 2.770.64 2.181 6.28 1.022 2.790.65 2.233 6.29 1.057 2.810.66 2.286 6.32 1.092 2.83

Page 98: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

570 Appendix F

Table F.O2 (continued)

Triangle 3-AFC

PC d ′ B d ′ B

0.67 2.339 6.38 0.128 2.860.68 2.393 6.44 1.164 2.890.69 2.448 6.52 1.201 2.920.70 2.504 6.60 1.238 2.950.71 2.560 6.69 1.276 2.990.72 2.618 6.80 1.314 3.030.73 2.676 6.91 1.353 3.070.74 2.736 7.04 1.393 3.120.75 2.780 7.18 1.434 3.170.76 2.860 7.34 1.475 3.220.77 2.924 7.51 1.518 3.280.78 2.990 7.70 1.562 3.350.79 3.058 7.91 1.606 3.420.80 3.129 8.14 1.652 3.500.81 3.201 8.40 1.700 3.590.82 3.276 8.68 1.749 3.680.83 3.355 8.99 1.800 3.790.84 3.436 9.34 1.853 3.910.85 3.522 9.74 1.908 4.040.86 3.611 10.19 1.965 4.190.87 3.706 10.70 2.026 4.370.88 3.806 11.29 2.090 4.570.89 3.913 11.97 2.158 4.800.90 4.028 12.78 2.230 5.070.91 4.152 13.75 2.308 5.400.92 4.288 14.92 2.393 5.810.93 4.438 16.40 2.487 6.300.94 4.607 18.31 2.591 6.950.95 4.801 20.88 2.710 7.830.96 5.031 24.58 2.850 9.100.97 5.316 30.45 3.023 11.100.98 5.698 41.39 3.253 14.850.99 6.310 71.03 3.618 25.00

B-factors are used to compute variance of the d′ values, whereVar(d′)= B/N, where N is the sample sizeReprinted with permission from “Tables for Sensory Methods,The Institute for Perception, February, 2002”

Page 99: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Appendix F 571

Table F.P Random permutations of nine

6 4 9 3 8 7 2 5 1 2 1 6 7 5 8 4 3 94 2 1 9 3 8 7 6 5 9 8 3 7 6 4 5 2 13 5 4 1 6 8 7 9 2 3 6 2 4 9 7 1 8 55 3 4 2 1 6 8 9 7 4 9 5 7 1 3 8 6 28 7 1 9 2 5 6 4 3 1 7 2 6 9 3 5 4 83 6 9 7 2 8 5 1 4 6 7 5 9 8 3 1 4 23 1 7 6 5 2 4 9 8 4 8 7 3 5 6 9 1 23 1 2 9 4 5 6 8 7 8 3 9 6 7 1 4 5 21 3 5 7 2 6 8 9 4 4 3 5 9 8 2 1 7 66 3 8 9 7 4 2 5 1 6 8 7 9 5 2 1 4 31 7 5 3 6 8 4 2 9 8 5 1 7 9 3 6 4 26 3 9 7 5 8 1 4 2 8 2 1 4 6 9 5 3 77 5 1 2 8 4 9 3 6 3 5 1 4 2 7 9 8 61 2 4 8 9 3 6 5 7 2 6 3 9 7 5 8 4 14 6 3 9 5 7 2 8 1 9 6 8 5 2 4 7 1 37 6 1 5 4 8 2 9 3 8 3 2 5 9 6 4 1 73 9 7 5 4 6 8 1 2 7 3 4 2 1 9 5 8 61 3 5 7 6 8 2 4 9 6 5 4 3 2 1 7 9 82 9 4 7 1 3 5 8 6 1 5 4 2 6 7 9 3 85 2 8 3 4 7 1 9 6 6 5 1 4 9 7 2 3 82 1 8 7 3 5 9 4 6 7 8 1 2 3 4 5 9 65 7 2 8 6 3 4 9 1 3 9 1 4 6 5 8 2 74 1 6 2 5 3 7 9 8 8 6 5 7 4 3 9 2 11 6 7 9 4 8 2 5 3 8 9 2 5 4 3 7 1 69 8 5 1 6 2 3 7 4 5 4 3 6 9 8 1 7 25 3 1 6 7 8 2 9 4 1 9 7 2 3 8 4 5 61 3 2 7 8 5 4 6 9 4 1 2 6 3 5 7 8 93 4 9 7 5 8 1 6 2 5 2 3 7 4 6 8 9 15 4 6 8 2 1 7 9 3 4 6 8 9 2 3 1 7 51 3 7 9 4 8 6 2 5 4 2 9 3 1 7 6 8 56 2 5 1 9 8 4 7 3 2 5 6 9 4 7 3 1 85 2 9 8 3 1 4 6 7 4 9 2 6 1 5 7 3 88 5 1 3 6 2 9 7 4 6 3 2 4 9 1 5 8 71 7 4 3 2 9 5 6 8 2 3 6 4 5 8 7 1 99 3 4 5 6 7 1 8 2 6 1 4 5 8 7 2 3 91 6 4 3 5 9 7 8 2 7 8 9 4 2 5 3 6 14 5 9 8 1 2 3 6 7 7 3 8 1 9 2 6 5 49 8 5 4 2 7 3 1 6 7 2 1 9 5 4 6 3 89 8 2 6 4 5 7 1 3 9 6 3 8 7 2 5 4 19 3 1 5 6 2 4 8 7 7 1 8 2 3 9 5 4 64 7 6 9 3 2 1 8 5 7 3 4 9 1 5 2 6 87 1 8 5 6 9 4 2 3 2 3 7 9 4 8 5 6 1

Each row with a column has the number 1–9 in random order. Start with any row (do not always start with the first or last rows) andread either from right to left or from left to right

Page 100: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

572 Appendix F

Table F.Q Random numbers

8 2 0 3 1 4 5 8 2 1 7 2 7 3 8 5 5 2 9 0 6 3 1 8 40 8 7 3 3 1 9 7 5 2 5 7 8 9 8 0 3 8 2 5 1 2 7 5 22 3 3 8 8 1 4 2 4 0 2 6 1 8 9 5 2 8 9 8 3 4 0 1 04 7 5 5 8 3 0 7 7 1 9 1 8 1 7 4 1 7 1 3 7 9 3 3 71 9 3 9 5 3 4 9 5 5 2 7 5 8 0 3 4 8 8 1 2 7 5 3 42 8 7 8 1 4 1 4 9 4 2 4 1 5 2 9 4 8 2 1 5 2 8 1 98 4 8 5 1 3 9 8 6 0 7 2 1 9 0 2 0 8 7 0 8 0 1 3 00 3 8 8 4 7 5 1 5 1 7 3 4 5 2 0 7 4 7 9 8 6 7 7 43 5 3 1 9 3 7 4 9 5 0 2 0 1 4 6 2 5 4 5 8 5 0 9 23 4 5 9 5 2 7 9 8 9 0 5 5 8 5 1 7 7 3 5 5 4 7 7 24 1 5 3 0 9 1 3 7 2 5 8 7 7 1 3 6 3 9 7 8 7 9 1 77 2 9 5 6 7 8 5 4 5 3 4 5 4 1 9 8 8 7 5 7 9 3 1 85 9 2 8 9 8 6 4 4 1 5 3 7 7 0 8 0 2 5 6 0 8 1 2 01 3 3 3 9 0 5 2 8 7 4 0 9 0 3 7 3 1 7 9 4 5 5 2 84 8 0 1 0 8 6 2 1 0 0 5 0 3 1 5 4 9 0 3 7 4 7 0 17 7 0 8 6 3 2 8 8 5 8 9 5 8 4 0 5 9 1 8 0 5 4 9 43 3 8 5 7 5 7 4 3 4 5 7 9 8 9 5 0 7 7 6 8 8 8 5 99 1 7 1 3 6 9 2 9 1 9 4 2 3 3 0 8 1 8 7 7 6 4 7 26 2 2 8 0 9 4 5 3 7 2 5 4 8 8 5 6 6 5 0 4 6 5 6 81 7 5 9 0 0 2 0 5 8 5 8 5 1 9 5 3 3 7 4 0 5 8 2 40 3 9 6 9 4 7 3 5 7 0 8 5 4 7 1 1 8 5 3 2 8 0 9 83 0 8 2 8 1 4 4 1 8 7 8 6 9 9 9 7 5 8 9 8 4 5 9 09 4 9 1 2 2 0 1 3 2 4 8 7 9 1 8 8 2 9 8 3 2 8 2 97 2 5 1 4 4 9 8 5 2 8 5 5 1 0 8 2 6 2 0 8 9 2 2 39 9 2 5 7 4 3 1 2 3 8 4 1 5 2 4 0 4 2 2 8 7 1 8 22 0 9 1 8 9 4 4 8 1 4 8 8 7 9 2 5 0 8 9 3 3 0 1 28 5 2 8 1 2 1 7 7 1 4 7 8 1 4 2 7 3 7 4 0 0 1 2 91 2 9 9 8 4 2 5 3 2 7 4 3 2 3 3 8 5 3 3 8 5 5 3 23 2 8 3 7 9 6 0 4 8 8 0 5 4 1 1 4 9 0 5 0 9 4 4 10 9 3 4 1 1 9 5 8 3 2 4 6 7 3 4 4 9 2 3 7 2 5 7 88 7 5 3 4 2 1 5 5 0 1 2 4 7 5 5 2 8 8 7 8 2 8 0 39 6 0 1 3 0 5 3 8 6 2 9 6 0 3 4 7 8 1 1 9 1 6 5 3

Start on any column or row and read from right to left or left to right or up and down to create random numbers of three digits tolabel your sample cups

Page 101: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Author Index

AAbdi, H., 459Abrahams, H., 195Abrams, D., 118–119Acree, T. E., 141, 196Adhikari, K., 438–439Afifi, A. A., 529, 531Agrawal, K. R., 265Albin, K. C., 43Al-Chakra, W., 276Allison, R. L., 351Alves, L. R., 442Alves, M. R., 439Amerine, M. A., 58, 79, 90, 93, 104, 216, 218–219,

231, 310, 414, 536, 544Amerine, M. R., 412, 421–422Amoore, J., 135Amoore, J. E., 37–38, 87, 126, 135, 143Andersen, R. K., 43Anderson, 25, 121, 154–155, 170, 205, 214, 221, 223,

234, 434, 446Anderson, N., 25, 155, 170Anderson, N. H., 205, 214, 221Anderson, T. W., 434Angulo, O., 97, 312Anscombe, F. J., 526–527Antinone, M. A., 131Antinone, M. J., 106, 119Aparicio, J. P., 251, 439Arbaugh, J. E., 364Ares, G., 268, 381, 442Armstrong, G., 229Armstrong, J. S., 362Arnold, G., 250Arnold, G. M., 72, 251Arvidson, K., 29

Ashby, F. G., 152Aust, L. B., 153, 411–412, 415Axel, R., 34Ayya, N., 32, 128, 183

BBachmanov, A. A., 28–29Bahn, K. D., 333Baird, J. C., 112–113, 117, 166, 212Baker, G. A., 421Ball, C. O., 260Banks, W. P., 154, 161Barbary, O., 97Barcenas, P., 247, 461Baron, R. F., 194Barrett, A. H., 276Bartoshuk, J. B., 29, 33Bartoshuk, L. M., 29–31, 34, 49, 127, 142, 156, 162,

164, 327Barylko-Pikielna, N., 181, 193Basker, D., 80, 166, 316–317, 501Bate Smith, E. C., 45Baten, W. D., 155Bauman, H. E., 422Beauchamp, G. K., 28–29, 32, 37, 306, 333Beausire, R. L. W., 342Bech, A. C., 304Beck, J., 288Beckley, J. H., 422Beckley, J. P., 414Beckman, K. J., 306Beebe-Center, J. G., 208Behling, R. W., 416Beidler, L. M., 25Bellisle, F., 304Bendig, A. W., 159

573

Page 102: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

574 Author Index

Benner, M., 452, 454Berger, J. O., 486Berglund, B., 140, 304Berglund, U., 304Berna, A. Z., 439Berry, D. A., 486Bertaux, E., 269Bertino, M., 266, 459–460Best, D. J., 338Bett, K. L., 13, 66Bi, J., 85–87, 96–97, 103, 110–111, 118, 120–122,

166, 186, 195, 312–314, 320, 496–498, 504,536, 549

Bijmolt, T. H. A., 458, 460Bingham, A. F., 48, 216Birch, D., 154, 333Birch, G. G., 181, 184, 333Birch, L. L., 154Birnbaum, M. H., 25, 167, 205Bisogni, C. A., 398Blair, E., 358Blakeslee, A. F., 33, 134Blalock, H. M., 528, 531Blancher, G., 252–253Blazquez, C., 276Blissett, A., 265Bloom, K., 185Bodyfelt, F. W., 15, 408, 419–421Bogue, J., 380, 387Bonnans, S., 193, 217Booth, D. A., 166, 265, 334, 466Borg, G., 162–164, 331Boring, E. G., 20–22, 125, 127, 160, 167, 204Bourne, M., 264Bourne, M. C., 259, 271Bower, J. A., 335Boyd, H. W., 358, 360, 362–363, 365, 368–369Boyd, R., 335Boylston, T. D., 248Braddock, K. S., 22Brainard, D. H., 288Brandt, M. A., 6, 73, 160, 179, 237, 266, 270–271Braun, V., 312Bredie, W. L. P., 436Brennan, J. G., 276Brennand, C. P., 37, 137Breslin, P. A. S., 32, 45Bressan, L. P., 416Breuil, P., 271, 274

Brimelow, C. J. B., 289, 293Brockhoff, P. B., 87, 94, 439Brown, D. G. W., 128, 135–138, 143Brown, D. M., 265Brown, W. E., 188, 265Brownlee, K. A., 57Brud, W. S., 38Bruhn, M., 398Brunner, G. A., 359Brunner, J. A., 359Bruwer, M. -J., 262Buck, L., 34Buettner, A., 37Bufe, B., 29, 134Burgard, D. R., 41Burks, T. F., 43Burns, L. D., 269Busing, F. M. T. A., 442Butler, F., 262, 276Butler, G., 158, 195Byer, A. J., 118–119, 328Byram, J., 48, 216Bystedt, J., 382–383, 389–390, 395, 399–400

CCabanac, M., 196Cagan, R. H., 32Cain, W. S., 36, 38–39, 44, 49, 128, 136, 143,

194, 216Cairncross, S. E., 231–232Caivano, J. L., 287Calixto, J. B., 43Callier, P., 444–445Calvino, A., 66Campo, E., 247–248Cano-Lopez, M., 438Carbonell, I., 246Cardello, A., 163–164, 172–173, 213–214, 331–332Cardello, A. V., 14, 63, 163, 167, 213, 266, 271, 304,

318, 327, 340–342, 351, 408Carlton, D. K., 424Carpenter, P. M., 287Carrasco, A., 287–288Carson, K., 262Carver, R. P., 473, 485–486Case, P. B., 362Casey, M. A., 380, 383–386, 388–389, 392–398, 401Castro-Prada, E. M., 264Cattell, R. B., 435

Page 103: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Author Index 575

Caul, J. F., 6, 25, 159, 231, 304Cavitt, L. C., 276Chambers, D., 243Chambers, D. H., 230, 233, 380, 382, 384–386, 389,

392–395Chambers, E. IV, 244Chambers, E. C., 150, 547Chambers, E. C. IV, 547Chandrashekar, J., 44Chapman, K., 313, 315Chapman, K. W., 311Chastrette, M., 38Chatfield, C., 439Chauvin, M. A., 271, 274Chen, A. W., 153–154, 332–333Chen, C. -C., 289Chen, J., 259, 262, 264, 276–277Chen, P. -L., 269Chollet, S., 253Chong, V. K., 289Christensen, C. M., 259, 263, 284Christensen, R. H. B., 87, 94Chung, S. -J., 163, 331, 439Civille, G. L., 39, 229–230, 235Civille, G. V., 46, 154, 214, 228, 231, 237, 240, 243,

268, 271Claassen, M. R., 8–9, 15, 421Clark, C. C., 48, 156, 182Clark, V., 198, 216–218, 529, 531Claude, Y., 265Cliff, M., 181, 190, 197, 243Cliff, M. A., 44, 180, 194Clifford, M. N., 45Clydesdale, F. J., 293–294Clydesdale, F. M., 298Cochran, W. G., 66, 71–72, 244, 338, 357, 528–529Cochrane, C. -Y. C., 313, 323Coetzee, H., 166, 305, 332Cohen, J., 536–539, 541Colas, B., 70Coleman, M. J., 154, 161Collings, V. B., 30, 129, 139Collins, A. A., 158Collins, A. J., 439Cometto-Muniz, J. E., 44Commetto-Muniz, J. E., 44Condelli, N., 46Conner, M. T., 166, 466Conover, W. J., 490

Cooper, P., 380–381Corbit, J. D., 208Cordinnier, S. M., 165Cornsweet, T. M., 139, 143Corrigan, C. J., 45, 230Costell, E., 14, 442Cowart, B. J., 44, 49Cox, D. N., 439Cox, G. M., 66, 71–72, 244, 357Coxon, A. P. M., 436, 444Creelman, C. D., 88, 112, 117Cristovam, E., 252Cui, G., 297Cultice, J., 228Cunningham, D. G., 141Curtis, D. W., 25, 161Cussler, E. L., 267

DDacanay, L., 181, 194Da Conceicao Neta, E. R., 29Dacremont, C., 97, 263–264Dairou, V., 252Dale, M. S., 134Dalton, P., 48, 136Dan, H., 266Darden, M. A., 268Da Silva, M. A. A. P., 156, 167–168, 330, 333Davidek, J., 336Davis, K. G., 421Davis, M. K., 181Day, E. A., 141De Belie, N., 276Deeb, S. S., 286De Graaf, C., 353Delahunty, C. M., 163, 241, 331Delarue, J., 252–253, 438–439Deliza, R., 14, 341Dellaglio, S., 246Delwiche, J., 47, 49, 119Delwiche, J. F., 65, 97, 165, 334Dematte, M. L., 284Derndorfer, E., 244De Roos, K. B., 196DeRouen, T. A., 189DeRovira, D., 184, 249DeSarbo, W. S., 442Desor, J. A., 38Dessirier, J. -M., 44–45

Page 104: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

576 Author Index

Dethmers, A. E., 408, 423, 425, 431De Wijk, R., 63De Wijk, R. A., 262, 265Diamant, H., 29Diamond, G. A., 525Diamond, J., 136, 221Diehl, R. L., 209Dietrich, A. M., 233Dijksterhuis, G., 179, 185–187, 192–193, 195, 197,

212, 245, 272, 458Dijksterhuis, G. B., 188, 250, 434, 439–440Dillman, D. A., 364Di Monaco, R., 276Dohrenwend, B. S., 363Dolese, M., 208Dooley, L. M., 243, 268Doty, R. L., 34, 38, 44, 49Drake, B., 261Drake, M., 15, 243, 246–247Drake, M. A., 276Dransfield, E., 395, 397Dravnieks, A., 39, 129, 133, 136Drexler, M., 39DuBois, G. E., 193DuBose, C. N., 284Dubose, C. N., 49, 195Duizer, L. M., 183, 195, 262, 276Dunkley, W., 88Dunkley, W. L., 15, 50, 160, 171, 420Dus, C. A., 268

EEbeler, S. E., 196Edgar, H., 74Edmister, J. A., 263Edwards, A., 166, 171Edwards, J. A., 341Eggert, J., 58, 290Eilers, P. H. C., 188Eimas, P. D., 208Einstein, M. A., 7, 156, 231, 234Ekberg, O., 272Ekman, G., 161El Dine, A. N., 163, 332Elejalde, C. C., 262El Gharby, A., 219, 351Ellmore, J. R., 386, 400Elmore, J., 251Elmore, J. R., 444–446

Eng, E. W., 212Engelen, L., 47–48, 265, 272Engen, T., 38–39, 167, 210, 306, 333Engen, T. E., 136–137, 143Ennis, J. M., 110, 121, 310, 537Ennis, D. M., 85–88, 93, 110–111, 116, 118–119,

121, 310, 414, 536–537, 539, 544–547, 549Enns, M. P., 47Epke, E., 46Epler, S., 335Erdfelder, E., 541–542Eriksson, P., 383–384, 395–397, 400Etaio, I., 438–439Eterradossi, O., 60, 289

FFaber, N. M., 445Falk, L. W., 383, 394Farbman, A. I., 42Farnsworth, D., 286Faye, P., 444Fechner, G. T., 21, 23, 25, 125–126, 150, 167, 208Feldt, L. S., 522Ferdinandus, A., 105Ferris, G. E., 320Fidell, B., 434Filipello, F., 318Findlay, C., 243Findlay, C. F., 243Findlay, C. J., 243Finn, A., 167, 318Finney, D. J., 105, 136Finney, E. E., 275Firestein, S., 34, 36Fischer, U., 188, 197Fisher, U., 46Fleiss, J. L., 338Forde, C. G., 163, 331Forrester, L. J., 37Foster, K. D., 265, 276Fox, A. L., 33Fox, R. J., 458Frank, R. A., 32, 48, 198, 216–218Frankel, M. R., 363Freund, J. E., 527, 529Friedman, H. H., 275Friedrich, J. E., 141Frijters, J. E. R., 34, 80, 85, 88, 118, 140–141, 166,

205, 207, 212–213, 223

Page 105: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Author Index 577

Fritz, C., 337Frøst, M. B., 13–14, 230Furse, D. H., 362Furst, T., 382–383, 394

GGacula, M. C. Jr., 12, 66, 89, 93, 103, 110, 122, 313,

316, 344, 351, 354, 425, 427, 451, 463, 474, 496,498, 507, 510, 520, 536–537, 539, 542–544

Gacula, M., 313, 357, 463, 474, 496, 498, 507, 520Gaito, J., 152Galanter, E. H., 26, 168–169Galvez, F. C. F., 384Gambaro, A., 444Garber, L. L., 14, 344Gardner, R., 295Garnatz, G., 305, 352Garrido, D., 188Garvin, D. A., 421Gay, C., 154–155, 165, 221, 333–334Geel, L., 443Geelhoed, E., 10Geisser, S., 522Gelhard, B., 44Gent, J. F., 162, 195Gescheider, G. A., 158, 170Gharby, A., 219Gharby, A. E., 351Giboreau, A., 13, 231Gilbert, J. M., 252Gilbert, E. N., 88Gillette, M., 43, 142Gillette, M. H., 414, 422Gimenez, A., 14Giovanni, M. E., 154, 158, 167, 212Girardot, N. F., 152, 154, 326Gladwell, M., 468Glatter, S., 460Glenn, J. J., 298Goldman, A. E., 380, 452Goldstein, E. B., 27Goldstein, L. R., 97Goldwyn, C., 421Gonzalez, R., 276Gonzalez-Barron, U., 262, 276Gordin, H. H., 249Gou, P., 247, 442, 444Gould, W. A., 352Govindarajan, V. S., 142

Gower, J. C., 250, 434, 439Gowin, D. B., 398Graaf, C., 353Gracely, R. H., 163Grebitus, C., 398Green, B. G., 34, 39, 41, 44, 49, 112–113, 142, 159,

162–163, 194, 214, 331Green, D. M., 112–113Greenacre, M., 248Greene, J. L., 163, 331Greenhof, K., 464Greenhoff, K., 326Greenhouse, S. W., 522Greer, C. A., 34Gridgeman, N. T., 311Grosch, W., 141Groves, R. M., 362Grun, I. U., 267Grushka, M., 30Guest, S., 163–164, 327Guinard, J. -X., 46, 181, 194, 244, 259, 436, 444Gullett, E. A., 195Guth, H., 141Gwartney, E., 45, 187, 194

HHair, J. F., 434Hall, C., 420Hall, M. L., 34Halpern, B. P., 31, 36, 180, 184, 195, 197Hamann, D. D., 264Hammond, E., 420Hanson, H. L., 208Hanson, J. E., 232Hard, A., 297Haring, P. G. M., 196Harker, F. R., 313Harper, R., 167–168Harper, S. J., 44Harris, H., 87Harris, N., 49Harvey, L. O., 139–140Hashim, I. B., 381Hatcher, L., 435–437Haynes, J. G., 353Hays, W. L., 356, 507, 519Head, M. K., 333Hedderley, D., 442Hedderly, D. I., 462

Page 106: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

578 Author Index

Hegenbart, S., 230Heidenreich, S., 268Hein, K. A., 13, 163, 167–168, 244, 318, 328Heisserer, D. M., 230Helgensen, H., 326Helgesen, H., 444Hellekant, G., 42Helson, H. H., 206–207, 210Hernandez, B., 291Hernandez, S. M., 44Hernandez, S. V., 22, 336Herzberg, P. A., 532Hetherington, M. J., 289Hewson, L., 44Heymann, H., 44–45, 66, 71, 180, 187, 194, 230–231,

234, 246, 251–252, 400, 434, 439, 455,458–461

Hinterhuber, H. H., 463Hirst, D., 245Hodgson, R. T., 422Hogman, L., 140Holway, A. H., 180Homa, D., 228Horne, J., 45Hornung, D. E., 47Hottenstein, A. W., 328Hough, G., 426, 428Hough, G. H., 430Hough, J. S., 287, 329Huang, Y. -T., 88Huberty, A. J., 438Huberty, C. J., 437Hudson, R., 38Hughes, J. B., 159Hunter, E. A., 66, 72, 295Hurst, D. C., 322Hurvich, L. M., 180Huskisson, E. C., 156Husson, F., 234, 236, 244Hutchings, J. B., 267, 285–286, 293, 296Hutchings, S. C., 265Huynh, H., 522Hyde, R. J., 260, 267Hyldig, G., 276Hyvonen, L., 267

IIoannides, Y., 276Irwin, R. J., 119

Ishii, R., 30, 127, 228, 231Ishimaru, Y., 29

JJack, F. R., 342Jacobsen, M., 269Jaeger, S. R., 167, 318, 442, 444James, W., 128, 207, 222Jameson, K. A., 286Janas, B. G., 383–384, 395Janhøj, T., 230, 262Jansco, N., 43Janusz, J. M., 181Jaros, D., 252Jellinek, G., 58, 74, 180, 193, 195, 304Jellinek, J. S., 193, 342, 354Jeltema, M. A., 39Jesteadt, W., 139Johansen, S. B., 444–445Johnson, D. E., 71Johnson, D. K., 385Johnson, J., 215, 220, 336Johnson, J. R., 340Johnson, P. B., 13Johnson, R., 455, 458Johnson, R. A., 66, 69, 72, 434, 437Jones, F. N., 166–167, 212Jones, L. V., 7, 152, 171–172, 326Jones, P. N., 166–167, 251Joshi, P., 289, 293Jowitt, R., 266, 276

KKahkonen, P., 340Kahn, E. L., 260Kaiser, H. F., 435Kallikathraka, S., 45Kalmus, H., 87Kamen, J., 12, 330Kamenetzky, J., 208, 213Kane, A. M., 291Kang, S. P., 292Kapur, K., 264Kare, M. R., 30Karrer, T., 34, 49Katahira, H., 469Kauer, J. S., 36Kautz, M. A., 49Kawabata, S., 268

Page 107: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Author Index 579

Kawamura, Y., 30Kelly, F. B., 66Kelly, G. A., 251Kennedy, F., 393Kennedy, J., 461Keskitalo, K., 163, 327Kiasseoglou, V. D., 262Kilcast, D., 259, 262, 425–426Killian, J. T., 298Kim, E. H. -J., 274, 276Kim, E., 269Kim, H. -J., 97Kim, H. S., 321Kim, J. -J., 269Kim, K. O., 165, 333Kim, M. P., 459Kim, U. -K., 29Kimmel, S. A., 59, 306, 333King, B. M., 162King, M. J., 459King, S. C., 341Kirkmeyer, S. V., 251Kittel, K. M., 141Klaperman, B. A., 452–453Kleef, E., 441–442Klein, B. P., 4Kleyn, D. H., 260–261Kobori, I., 30Koehl, L., 269Kofes, J., 219Kohli, C. S., 456, 458, 460Kokini, J. L., 262, 267Kollar-Hunek, K., 244Koo, T. -Y., 165Koster, E. P., 313–314, 328Kovalainen, A., 383–384, 395–397, 400Koyasako, A., 193, 195Kraemer, H. C., 93Krasner, S. W., 231Kravitz, D., 250, 439Krieger, B., 342, 453, 466–467Krinsky, B. F., 243Kroeze, J. H. A., 30–31, 195Kroll, B. J., 154, 332Kroll, B. R., 335Kroll, D. R., 414Krueger, R. A., 380, 383–386, 388–389, 392–398, 401Kruskal, J. B., 458, 460Krystallis, A., 381

Kubala, J. J., 427Kuhfield, W. F., 440Kuo, W. G., 291Kuo, Y. -L., 193, 196Kurtz, D. B., 163Kuznicki, J. T., 41

LLabbe, D., 48, 184–185Labuza, T. P., 248Lahteenmaki, L., 341Laing, D. G., 38, 41Laitin, J. A., 452–453Lallemand, M., 187–189, 191–192, 197Lana, M. M., 289Land, D. G., 165Lane, H. L., 157, 173Langron, S. P., 250, 439Larson-Powers, N., 165, 181, 193, 195, 221Larson-Powers, N. M., 248Laska, M., 38Lassoued, N., 252, 260, 262, 268, 276Lavanaka, N., 330Lavin, J., 49Lavine, B. K., 440, 442Lawless, H., 31, 38, 421, 520Lawless, H. T., 4, 8–9, 15, 22, 25, 30–34, 37–38,

40–43, 45–49, 87–88, 103, 106, 119, 127–128,134–135, 137, 140, 142–143, 153–156, 158, 160,163–165, 167–168, 181–183, 187, 189, 193–195,197–198, 207–213, 216–218, 221–222, 229–230,235, 239, 248, 266–267, 311, 313, 329–333, 336,343, 363–364, 408, 421, 455, 459–461, 538

Lea, P., 507, 520Leach, E. J., 193Ledauphin, S., 188Lee, C. B., 41, 45, 182–183, 187, 194Lee, C. M., 271Lee, H. -C., 293Lee, H. -J., 165Lee, H. -S., 212Lee, H. S., 89, 438Lee, J., 243Lee, J. F., 193Lee, O. -H., 438Lee, S. M., 438Lee, S. -Y., 14, 187–188, 197, 212Lee, W., 262Lee, W. E., 181, 183, 196, 198

Page 108: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

580 Author Index

Lee, W. E., III, 264Leedham, P. A., 287Leffingwell, J. C., 45Le Fur, Y., 247Lehninger, A. L., 25Le Moigne, M., 246Lenfant, F., 264Lengard, V., 244Le Reverend, F. M., 184–185Le, S., 434, 437Leuthesser, L., 456, 458, 460Levitt, H., 139Levy, N., 210Liggett, R. A., 97Light, A., 400Likert, R., 154Lillford, P. J., 267Lindvall, T., 162Linschoten, M. R., 140Liou, B. K., 267Liska, I. H., 237, 271Little, A., 291, 297Liu, J., 269Liu, Y. H., 189, 197Lotong, V., 246, 436Louviere, J. J., 167, 318Lovely, C., 442Luan, F., 438Lubran, M. B., 46Lucak, C. L., 65Lucas, F., 304Lucas, P. W., 264–265Lucca, P. A., 267Lund, C. M., 244, 439Lundahl, D. S., 137–138, 143, 187, 197, 453, 466Lunghi, M. W., 251Luo, M. R., 291, 297Luyten, H., 262Lynch, J., 193Lyon, B., 238, 243Lyon, B. G., 46

MMacDougall, D. B., 289, 293MacFarlane, J. D., 166, 316, 501MacFie, H. J. H., 14, 66, 72, 189, 191, 197, 250, 326,

328, 341, 440, 442, 464MacKay, D., 442MacKinney, G., 291, 297

Macmillan, N. A., 88, 112, 117MacRae, A. W., 10, 110, 459Macrae, A. W., 93Mahar, T. J., 269Mahoney, C. H., 165Malcolmson, L., 287Malhotra, N., 458Maller, O., 340–341Malnic, B., 36Malone, G. J., 25, 153–154, 156, 158, 163, 167–168,

210–211, 331, 364Malundo, T. M. M., 445Mangan, P. A. P., 75Mann–Whitney, 499–501, 503Mantei, M. M., 452Marchisano, C., 243, 311, 315Marcus, M. J., 212Marin, A. B., 26, 37, 128, 138, 141Marks, L. E., 162, 167, 170, 210, 239Marlowe, P., 381, 383, 385, 394–395Martens, H., 235, 434, 445Martens, H. J., 276Martens, M., 235, 434, 439, 445Martin, K., 244Martin, N., 48, 438–439Martın, Y. G., 438Martınez, C., 442Mastrian, L. K., 422–423, 428Masuoka, S., 137Mata-Garcia, M., 97Matsudaira, M., 268Mattes, R. D., 85, 168, 208–209, 336Matysiak, N. L., 194, 198Matzler, K., 463Mazzuccheli, R., 259McBride, R., 334McBride, R. L., 25, 167, 205, 215, 220, 340, 420McBurney, D. H., 30–31, 129, 139, 156, 157, 195, 206McCloskey, L. P., 247McClure, S. T., 47McDaniel, M. R., 45, 47–48, 330McDonald, S. S., 380, 452McDonell, E., 243McEwan, J., 326McEwan, J. A., 247, 251, 400McGowan, B. A., 187–188, 197, 212McGrath, P., 10McManus, J. P., 45McNulty, P. B., 181, 196

Page 109: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Author Index 581

McNutt, K., 423Mead, R., 154–155, 165, 221, 333Mecredy, J. M., 153, 155Meilgaard, M., 1, 7, 11, 58, 108, 110, 136, 154, 157,

160–161, 218, 221, 231, 234, 238–239, 273, 332,528, 545

Meilgaard, M. C., 39, 82Meiselman, H. L., 3, 12–13, 23, 31, 181, 195, 341,

353, 462Mela, D., 140Mela, D. J., 267Melgosa, M., 293, 297Mellers, B. A., 205Merbs, S. L., 286Meudic, B., 439Meullenet, J. -F., 235, 244, 271, 274, 434, 442,

445, 465Meyners, M., 313Miaoulis, G., 457Miller, J., 452–453Miller, I. J., 29–30, 33Milliken, G. A., 66, 69, 71–72Milo, C., 141Mioche, L., 265Mitchell, J. W., 84Mizrahi, S., 428Moio, L., 41Mojet, J., 136Montag, E. D., 293Monteleone, E., 436, 442, 444Montell, C., 42–43Montouto-Grana, M., 438Moore, L. J., 156, 195, 198Morris, C., 73Morris, J. D., 437Morrison, D. G., 105–106, 132–133Morrot, G., 49Moskowitz, H. R., 1, 8, 23, 158, 161, 167–168, 181,

231–232, 304, 327, 332, 342, 380, 383, 385–387,401, 408, 462, 466, 469

Moskowitz, H., 259, 453, 466–467Muir, D. D., 72Mullan, L., 304Mullen, K., 88, 119Mullet, G. M., 454–455Munoz, A. M., 154, 214, 228, 231, 237, 239,

271–273, 408–414, 426, 454Munton, S. L., 181, 184Murphy, C., 32, 36, 44, 47, 49, 128, 137, 214, 216

Murphy, C. L., 36, 44, 47–48, 128Murray, J. M., 229, 231, 234, 238, 241Murray, N. J., 45Murtagh, F., 248

NNæs, T., 245Naes, T., 75Nagodawithana, T. W., 30Nakayama, M., 410, 414, 417, 424Narain, C., 251Nathans, J., 286Neilson, A. J., 181Nejad, M. S., 30Nelson, J., 409, 417–418Nestrud, M. A., 461Neter, J., 530Newell, G. J., 166, 316–317, 501Nielsen, D., 276Niklasson, G., 289Nindjin, C., 243Niwa, M., 268Nobbs, J. H., 289Noble, A. C., 39–40, 45, 182, 193–194, 197–198, 217,

230, 434, 439Nogueira-Terrones, H., 243Noma, E., 112–113, 117, 166, 212Norback, J. P., 342Novak, J. D., 398

OOaten, M., 284Obein, G., 288O’Connell, R. J., 137, 140Odbert, N., 119, 137Odesky, S. H., 312O’Keefe, S. F., 196Olabi, A., 160, 163, 222, 239, 332Oleari, C., 293Oliveira, M. B., 439Oliver, T., 14, 351, 354Olson, J. C., 400O’Mahony, M., 9–10, 15, 30, 88–89, 97, 99, 119, 127,

137, 165, 195, 228, 231, 235, 312, 333–334, 420,473–474, 478, 489, 491, 507–511, 517

O’Mahony, M. A., 119Oreskovich, D. C., 250, 252Orne, M. T., 363, 365Osterbauer, R. A., 284

Page 110: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

582 Author Index

Otegbayo, B., 271Otremba, M. M., 238Ott, D. B., 182Ough, C. S., 421Overbosch, P., 189–190, 196, 198Owen, W. J., 189

PPage, E. B., 501Pages, J., 461Pagliarini, E., 333Pangborn, R. M., 1, 15, 22, 50, 58, 88, 140, 154,

158, 160, 165, 167, 171, 181, 183, 193,195–196, 198, 212, 221, 248, 287, 336,420, 466

Pardo, P. J., 286Parducci, A., 154, 208, 210–212, 214Park, H., 165Park, J. Y., 165Parker, M. J., 334, 337, 340Parker, S., 208Parr, W., 236Patapoutian, A., 42Patel, T., 45Paull, R. E., 289Pearce, J. H., 158, 167–168, 330Pearson, A. M., 417, 502–503, 527–528, 532Pecore, S., 414–415Pecore, S. D., 336Peleg, M., 266Pelosi, P., 37Penfield, M. P., 194Pereira, R. B., 268Perng, C. M., 47Perrett, L. F., 210Perrin, L., 461Peryam, D., 171, 325Peryam, D. R., 5, 79, 88, 97, 152, 154, 326–328,

353, 425Petersen, R. G., 66, 72, 244Peyron, M. A., 265Peyvieux, C., 185–186, 193, 195, 197Pfaffmann, C., 38Philippe, F., 269Philips, L. G., 50Pickering, G. J., 436Pieczonka, W. C. E., 287Pierce, J. R., 88Piggot, J. R., 167–168

Piggott, J. R., 140, 235, 251, 474, 530–531Pilgrim, F. J., 326–327Pillonel, L., 438Pineau, N., 184–185, 244Pionnier, E., 182, 196Pisanelli, A. M., 37Plaehn, D., 453, 466Plata-Salaman, C. R., 30Pointer, M. R., 292Pokorny, J., 336Pollard, P., 486Popper, R., 335, 439, 455, 458, 460Poulton, E. C., 169, 205–206, 210, 213, 215,

220–222, 340Powers, J. J., 231, 234, 238Prazdny, S., 288Prescott, J., 43–44, 47–48, 128, 194, 306Primo-Martin, C., 268Prinz, J. F., 265Prokop, W. H., 129, 133, 136Punter, P., 439–440Punter, P. H., 136–137, 143

QQannari, E. M., 453, 466Quesenberry, C. P., 322

RRaaber, S., 326, 329Rabin, M. D., 136, 143Raffensperger, E. L., 229Raheel, M., 269Rainey, B., 241Ramsay, J. O., 460Randall, J. H., 235Rankin, K. M., 214Raskin, J., 486Rason, J., 252Rawlings, J. O., 531Rayner, G. D., 490Rayner, J. C. W., 490Raz, C., 387Reece, R. N., 408Reed, D. R., 140, 143Regenstein, J. M., 412Reinbach, H. C., 192, 194Renneccius, G., 44–45Resano, H., 442–443Resurreccion, A. N. A., 384

Page 111: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Author Index 583

Resurreccion, A. V., 351–355, 359, 364–365,368–369, 389, 391–392

Resurreccion, A. V. A., 271, 332Retiveau, A., 243Reynolds, F. D., 385Richardson, J. T. E., 486Richardson, L. F., 161Richardson, N. J., 265Riera, C. E., 43Rine, S. D., 181, 195Riskey, D. R., 208–209, 212Risvik, E., 246, 400, 461Riu-Aumatell, M., 243Riviere, P., 463, 466Roberts, D. D., 196Robertson, G. L., 425, 427–428Robichaud, J. L., 182, 193Rødbotten, M., 442Roessler, E. B., 1, 90, 94, 308, 310, 412, 421–422,

492, 499Roessler, E. R., 58Rohlf, F. J., 519, 539Rohm, H., 261, 326, 329Rojo, F. J., 276Roland, A. M., 267Rosenberg, S., 459Rosenthal, A. J., 259Rosenthal, R., 165, 549Rosin, S., 193Ross, C. F., 65, 276Ross, J. S., 161Rothman, D. J., 74Rothman, L., 334, 337, 340Roudaut, G., 261, 267Rousseau, B., 97Rozin, P., 36, 196Rubico, S. M., 45Rucker, M. H., 364Rugg, D., 366Runnebaum, R. C., 266Rutenbeck, S. K., 411, 414, 422–424, 428Rutledge, K. P., 243Ryle, A., 251

SSaint-Eve, A., 252Saletan, L. T., 328Saliba, A. J., 128, 306Salmenkallio-Marttila, M., 272

Salvador, A., 262Sandusky, A., 208Santa Cruz, M. J., 442Sarris, V., 210–211, 214Sato, M., 262Sawyer, F. M., 14, 213, 330, 351Schaefer, E. E., 351–352, 354, 356, 358, 360–362,

364, 369Scheffe, H., 319Schieberle, P., 141Schifferstein, H. J. N., 208, 217, 341Schifferstein, H. N. J., 34, 205, 207, 212–213,

223, 303Schiffman, S., 260Schiffman, S. S., 458, 460Schlegel, M. P., 48, 119Schlich, P., 72, 75, 93, 110, 250, 444–445, 536,

544–545Schmidl, M. K., 248Schmidt, H. J., 306, 333Schmidt, T. B., 442–444Schraidt, M., 339Schraidt, M. F., 306, 333Schutz, H. G., 163–164, 219, 304, 327, 331, 341–342,

353, 355–356Schwartz, C. J., 268Schwartz, N. O., 271Scott, T. R., 30Segars, R. A., 63, 266Seo, H. -S., 249Serrano, S., 438Sessle, B. J., 30Seymour, S. K., 264Shamdasani, P. N., 380–381, 384, 390–396, 400–401Shamil, S., 193Shand, P. J., 167–168Shankar, M. U., 284Sharma, A., 296–297Sharma, G., 297Shepard, R., 165Shepherd, R., 14, 334Sherman, P., 262, 272Shick, T. R., 31, 156, 195Shoemaker, C. F., 156, 195Shoemaker, C. R., 198Sidel, J., 304, 337, 343Sidel, J. L., 1–2, 5, 15, 58, 65, 85, 90, 158, 167–168,

231, 234, 236, 304, 309, 352, 356, 358, 367, 386,420, 495, 520

Page 112: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

584 Author Index

Sieber, J. E., 74Siebert, K., 45, 287Siebert, K. J., 287–288Sieffermann, J. -M., 252–253, 438–439Siegel, S., 151, 490, 501–503Silver, W. L., 42–43Simon, G. A., 527, 529Sinesio, F., 75Singer, E., 363Singh, J., 66, 89, 93, 316, 357, 510, 536Sivik, L., 297Sizer, F., 49Sjostrom, L. B., 180, 193, 231–232Skibba, E. A., 252Skinner, E. Z., 156, 181, 189, 193–195, 197,

271, 336Slama, M., 445Small, D. M., 47–48Smith, E. A., 380, 382, 384–386, 389, 391–395Smith, G. L., 95, 478, 490Smith, S. L., 387, 398Snedecor, G. W., 439, 484, 528–529Sobal, J., 363Sokal, R. R., 519, 539Solheim, S., 75Solomon, G. E. A., 421Sorensen, H., 351, 353, 356–357, 359–360, 363,

368–369Southwick, E. W., 39Sowalski, R. A., 45Spangler, K. M., 30Spence, C., 263Stahle, L., 437, 439Stanton, D. T., 39Steenkamp, J. -B. E. M., 252Stepanski, P. J., 435–437Stevens, D. A., 25, 37, 42–43, 47, 49, 103,

137, 140Stevens, J., 434–437, 531, 533Stevens, J. C., 135–136, 143Stevens, S. S., 23–25, 37, 47–49, 126, 135–137,

151–152, 161–162, 168–169, 182, 194, 216, 239,284, 423, 434–437, 531, 533

Stevenson, R. J., 43–44, 47–49, 194, 216,284, 423

Stewart, D. W., 380–381, 384, 390–396,400–401

Stewart, V., 251Stillman, J. A., 49, 119

Stocking, A. J., 134Stoer, N., 248Stoer, N. L., 153, 155, 165, 221Stone, H., 1–2, 5, 7, 58, 63, 65, 79, 85, 90, 155, 231,

234, 309, 337, 342, 349, 352, 356, 358, 367, 386,495, 520

Stouffer, J. C., 423–424Stryer, L., 25Sugita, M., 28–30Sulmont, C., 228, 241Svensson, L. T., 162Swain-Campbell, N., 44Swartz, M., 193Swartz, V. W., 5, 88, 987Swets, J. A., 112–113Syarief, H., 233Szczesniak, A. S., 6, 230, 237–238, 259–261, 264,

266, 270–271, 273, 275, 336Szolscanyi, J., 43Sztandera, L. M., 269

TTabachnik, L., 434Tai, C., 43Takagi, S. F., 62Talens, P., 289Tang, C., 495Tarea, S., 252Taubert, C., 422Taylor, D. E., 196Taylor, J. R. N., 166, 305Teghtsoonian, M., 162Teghtsoonian, R., 210Templeton, L., 340Teorey, T. J., 452Tepper, B., 251, 267Tepper, B. J., 267, 327Terpstra, M. E. J., 272Thiemann, S., 93Thieme, U., 97Thomas, C. J. C., 45Thompson, R. H., 417Thomson, D. M. H., 400, 440Thorndike, E. L., 216Thurstone, L. L., 112, 116–119, 152, 166, 171–172Thybo, A., 272, 276Thybo, A. K., 272, 276Todd, P. H., 142Tomic, O., 244

Page 113: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Author Index 585

Tournier, C., 272Townsend, J. T., 152Trant, A. S., 424Trijp, H. C. M., 303, 335–336, 341, 462Trout, G. M., 417–418Tucker, D., 49Tunaley, A., 449Tuorila, H., 17, 50, 132, 193, 195, 304, 341Tyle, P., 265

UUhl, K. P., 351Ukponmwan, J. O., 269Ura, S., 118

VVainio, L., 195Valentin, D., 459Valous, N. A., 292Van Buuren, S., 189, 192–193Van den Broek, E., 192–193Van der Bilt, A., 264Van der Klaauw, N. J., 4, 218Van Gemert, L., 141Van Oirschot, Q. E. A., 436Van Trijp, H. C. M., 303, 335–336, 341, 462Van Vliet, T., 259, 262, 264Varela, P., 262, 272, 275Vargas, W. E., 289Verhagen, J. V., 47–48Vickers, A., 331Vickers, Z., 20, 158, 167–168, 215, 220, 263–264,

276, 295, 304, 336Vickers, Z. M., 304, 340–341Villanueva, N. D. M., 155–165, 167–168,

330, 333Vincent, J. F. V., 267, 276Viswanathan, S., 106, 162Viti, R., 37Vogelmann, S., 435Vollmecke, T. A., 212Von Arx, D. W., 452von Hippel, E., 380Von Sydow, E., 47

WWajrock, S., 453, 466Wakeling, I., 328, 442Wakeling, I. N., 72

Walker, D. B., 132, 135Walker, J. C., 132, 135Ward, L. M., 162, 205, 207, 213–214,

222, 239Wasserman, S. S., 263Wasserman, W., 530Watson, M. P., 251Weedall, P. J., 269Weenen, H., 272Weiffenbach, J. M., 29Weiss, D. J., 155Welkowitz, J., 486, 536, 538–539, 545, 547Welleck, S., 103, 110Wellek, S., 536, 549Wells, W. D., 394Wendin, K., 188Wessman, C., 410, 414, 417, 424Wetherill, G. B., 139Whitehouse, W. G., 363, 365Whiting, R., 87, 289, 291Whorf, B. L., 228Wichern, D. W., 434, 437Wiechers, J. W., 436Wienberg, L., 439Wilkinson, C., 259Winakor, G., 154Winer, B. J., 507–508, 510, 513, 515Wintergerst, A. M., 265Wise, T., 452–453Wiseman, J. J., 48Witherly, S. A., 260, 267Wold, S., 437, 439Wolf, M. B., 547Wolfe, D. A., 490Wolfe, K. A., 408, 423, 426Wong, S. -Y., 195Woods, V., 242Worch, T., 245Wortel, V. A. L., 436Woskow, M. J., 212Wright, A. O., 330Wright, K., 252, 459Wrolstad, R. E., 296Wysocki, C. J., 13, 37, 136

XXiao, B., 88Xu, H., 297Xu, W. L., 264

Page 114: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

586 Author Index

YYackinous, C., 442Yamaguchi, S., 30, 32, 167Yao, E., 328Yau, N. J. N., 45Yokomukai, Y., 33York, R. K., 4, 408, 410–411, 419Yoshida, M., 181, 193Yoshikawa, S., 261Young, N. D., 444Young, T. A., 414–415

ZZampini, M., 263Zarzo, M., 39Zellner, D. A., 14, 49, 206, 208Zhang, Q., 293Zhao, H., 36Zheng, C., 276Zheng, L., 276Zimoch, J., 188–189, 192, 195, 197Zook, K., 58, 76, 234, 290

Page 115: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Subject Index

AAbbott’s formula, 106–108, 120, 132, 134, 545Accelerated storage, 425, 428, 430Accelerated testing, 360, 428Acceptance, 7–8, 10, 12, 112, 166, 185, 208, 219,

227, 229–230, 260, 304–305, 318, 350, 352,355, 357, 425–426, 441, 444, 446, 455, 457,463–465, 483, 516, 526, 536, 539–541,544–545

test, 8, 116, 208, 219, 304–305, 318, 325–344Acceptor set size, 342–343Accuracy, 3, 13, 142, 154, 168, 183, 221, 238, 248,

385, 388, 418Adaptation, 19, 21, 30–32, 39–40, 42, 46, 88, 95,

103–104, 113, 126, 129–130, 135–137,150–151, 165, 181–182, 194–195, 197,205–207, 209–210, 221, 236, 240,336, 474

sensory, 19, 30, 126, 129, 135–137, 205, 207, 209Adaptive methods, 139–140Adjustable scales, 333–334ADSA, 420–421Advertising, 4, 12–14, 16, 230, 305, 310, 312, 315,

320, 344, 350, 353–354, 367, 370, 381,451–452, 454, 466, 468, 536

2-AFC test, 82, 973-AFC test, 89, 95, 106, 111, 118–121, 131–132, 546Agreement, 13, 37–38, 72, 88, 134, 159, 172, 231,

240, 244–247, 270, 287, 351–352, 363–364,367–368, 385, 396, 417–419, 458, 460, 501,503, 520, 525–526

Alphalevel, 90, 102, 122, 310, 468, 480–481, 484,

486–487, 501, 529, 535, 537–540,543–544, 563

risk, 102, 104, 482, 486, 490, 493, 513, 539–541,544–546, 548

Alternative hypothesis, 82–84, 86, 88, 102, 104, 107,308, 415, 479–482, 484–487, 495, 538–547

Analysischemical, 37, 126data, 5, 38, 87, 94, 125, 131, 135–136, 156–158,

180, 187–193, 198, 213, 230, 235–236, 244,251, 289, 306, 361, 383, 439–440

descriptive, 6–9, 15–16, 20, 27, 46, 70–71, 103,150, 154–155, 157, 159, 165, 170, 183–186,192, 214, 221–222, 227–253, 267–269, 272,289, 318, 330, 385–386, 413, 423, 425, 437,442, 444–446, 456, 507, 516, 520

flavor, 126frequency, 367multivariate, 235, 369, 434, 437–438, 445,

522, 533numerical, 382penalty, 335, 339–340, 462procrustes, 250–252, 434, 439–440, 446, 458sensory, 7, 34, 36, 38, 142, 230, 237, 244, 516, 548statistical, 7, 9, 68, 85, 89, 150–151, 157, 183,

185–186, 191, 234, 305, 361, 368, 382, 408,413, 417, 419–420, 446, 453, 473–474, 513,522, 531

univariate, 437, 455of variance, 7, 10, 152, 155, 166, 213, 233–235,

243–245, 250, 252, 317–319, 328, 339, 356,412, 434, 437–438, 455, 463, 475, 501, 503,507–522, 526, 530

Anchor words, 58, 234, 516Anon, 354ANOVA, 170, 187, 235, 352, 434, 437, 439, 440, 461,

484, 508–519, 521–522, 536, 537

587

Page 116: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

588 Subject Index

Appearance, 6, 20, 49–50, 59–60, 63, 67, 77, 157,181, 184, 214, 216, 237, 262, 270, 283–299,304, 343, 354, 356, 363, 366, 375, 412–413,417, 420–422, 454, 492, 522

Appropriateness, 242, 304, 341–342, 387, 400, 501Arrhenius equation, 360, 428, 430–431Ascending forced choice, 128–129, 131,

133–135, 143Assistant moderators, 394–395Assumptions, 89, 93–94, 101, 105, 113–114, 117,

119, 133, 164, 166, 312–313, 322, 352, 360,364, 385–386, 433, 477, 487, 490, 500, 503,513, 521–522, 543

ASTM, 12, 39, 45, 111, 121, 127, 129, 131, 133–134,136, 141–142, 155–158, 160, 207, 243,289–290, 298, 305, 311–312, 351, 381, 451,467, 536

Astringency, 20, 30, 41, 45–46, 65, 150, 183,193–194, 239, 266–267

AUC, 180, 186–187Audio, 388–389, 396

BBallot building, 386Barter scale, 343“Bathtub function”, 427Belmont Report, 74Best-worst, 305, 318Best-worst scaling, 167, 305, 317–318Beta

binomial, 94, 96–97, 105, 313, 320, 496–499, 565risk, 102, 104, 108, 110, 121, 355–356, 415, 424,

481–482, 485–487, 537, 539–542, 544–546Between-groups, 70, 520, 541Between-subjects, 512, 520Biases, 14, 48, 112, 127, 150–151, 169, 198,

203–223, 284, 340, 344, 359, 366, 392Binomial

distribution, 5, 10, 89–90, 308–309, 414,489–491, 565

expansion, 308, 490–492Bitter, 9, 20, 28, 30–34, 36, 43, 49, 65–66, 127–128,

133–134, 140, 156, 162–163, 180–183,192–193, 195, 205, 218, 223, 231, 233,239–240, 249, 335, 340, 416, 420–422,466–467

Bitterness, 31–34, 66, 128, 134, 140, 156, 163,180–182, 192, 193, 205, 218, 223, 240, 335,421, 422, 466, 467

Blindtest, 344, 351testing, 219, 418, 468

Blocking, 520–521Bulletin board, 399

CCalibration, 6–7, 103, 159–160, 162, 169–170, 214,

220–223, 240, 243, 327, 408, 411, 413,418, 423

CALM scale, 163Carbonation, 44–45, 49, 266Cardello, 14, 63, 163–164, 167, 172–173, 213–214,

266, 269, 271, 304, 318, 327, 331–332,340–342, 351, 408

Carr, 561–562Category

appraisal, 351, 453review, 453–454, 456scales, 25, 152–155, 157–158, 160, 162, 167–170,

214, 221, 271–272, 327Centering bias, 206, 213, 215, 220, 223, 340Central

location test, 168, 213, 350, 352–356, 358,467–468

tendency, 151, 157, 189, 475, 481, 487, 490Chance correction, 90, 92, 104–105, 132, 135, 486,

496–499, 546Chat room, 399Chemesthesis, 20, 34, 41–47Children, 59–60, 153–154, 158, 162, 166–167, 216,

219, 306, 332–333, 354, 366–367, 398–399testing with, 332–333

Chi-square, 10, 86–87, 89–90, 95, 98–99, 151, 308,310, 312–313, 322, 337, 340, 367, 485,489–490, 493–499, 501, 503, 533, 555

CIE, 291, 294, 296–297Claim substantiation, 4, 12, 16, 111, 305, 310–311,

350–351, 358, 451, 467–468Classical psychological errors, 218–219CLT, 352–354, 467–468Coefficient of variation, 233, 476Collage, 387, 399Color

contrast, 205, 207intensity, 229, 413, 528–529solid, 293–295vision, 20, 205, 286, 293

Commodities, 15, 107, 128, 306, 408, 419

Page 117: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Subject Index 589

Community groups, 352, 386Co-moderators, 394–395Competitive surveillance, 351, 453–462Complete block, 70, 170, 213, 337, 356, 484, 503,

510–513, 516, 520–522, 536Concept

map, 397–399testing, 349–351

Conjoint analysis, 387, 462Conjoint measurement, 387, 463Consumer

contact, 354, 384, 386, 453test, 4, 8, 10, 13–14, 59–60, 64–66, 70, 101, 103,

154, 159, 171, 217, 219, 304, 306, 308–309,313, 315–316, 319, 325, 328, 334–335,350–356, 364, 367, 369, 385–386, 389, 391,413, 418, 467–468, 537,541–542, 548

Context, 2–3, 12, 150–151, 160, 169, 172, 198,203–223, 239, 305, 332, 334, 336, 340–342,357, 366, 382, 386, 395, 397, 408

effects, 203–223, 334, 357Continuity correction, 90, 92, 111, 310, 492, 494, 504Continuous tracking, 183–186, 194, 197–198Contrast, 7, 9, 13–15, 23, 29, 37–38, 40, 117, 127,

150, 165, 172, 198, 205–211, 213, 214,220–223, 229, 234, 240, 260, 267, 342,382–383, 388, 393, 396, 436–437, 452, 456,460, 465, 468, 481, 519

Control products, 5, 110, 358, 411, 425–426, 479,535, 538

Cooling, 20, 41, 43–45, 65, 266, 408Correlation, 7, 12–13, 34, 48, 87, 117, 136–137, 140,

142–143, 185, 192, 197, 207, 212–213,216–217, 252, 269–270, 273–276, 287–288,298, 331–332, 335, 341, 353, 413, 417, 423,435–436, 438–440, 456, 458, 460–462,464–465, 474, 490, 502–503, 525–533,539, 547

Cost, 10–11, 14, 23, 32, 64, 67, 75–76, 101–102,112–113, 138, 150, 167, 184, 197–198, 273,304, 313, 321, 350, 352–356, 358–362, 369,382–383, 386, 391, 409, 412, 423, 425, 442,453, 455–456, 459–462, 485, 536–537,546, 548

Counterbalancing, 58, 89, 219–221, 307, 329, 357Cross

modal interactions, 152, 160–167modality matching, 24, 152, 161–162, 213

DDairy products, 64, 419–420Data analysis, 5, 38, 87, 94, 131–132, 136, 156–158,

180, 187–193, 198, 213, 230, 235–236, 244,251, 289, 306, 361, 383, 439–440, 473

D’ (d-prime), 114, 312, 322, 538Debriefing, 58, 384, 392, 394, 395–397Defects, 15, 137, 304, 331, 366, 408, 411–413,

415–416, 418–421, 423–424,427, 464

Degree of difference ratings, 110, 415Degrees of freedom, 68, 90, 122, 317, 478, 480–481,

483–484, 487, 490, 494–495, 499, 501,508–512, 514–515, 518, 522, 528–531, 533,536–537

Delta value, 118, 546–547Deming, 409Demographic, 356, 359, 363–365, 369, 382, 387, 391,

396, 401, 452, 466Dependent, 3, 7, 27, 34, 45, 61, 75, 80, 93–96, 98,

107, 113–114, 121–122, 169, 181–185, 192,194, 196, 204, 213, 228, 230, 235–236,246–247, 262, 266, 270, 275, 287, 322, 327,337, 340, 343, 352, 356, 368, 382, 392,411–412, 418, 421, 430–431, 435–438, 474,482–484, 490, 494, 499–501, 503, 508,510–511, 514, 522, 525–526, 528, 531, 538,541–542, 547

Descriptive analysis, 6–9, 15–16, 20, 27, 46, 70–71,103, 150, 154–155, 157, 159, 165, 170,183–186, 192, 214, 221–222, 227–253,267–269, 272, 289, 318, 330, 386, 413, 423,425, 437, 442, 444–446, 456, 507,516, 520

Detection Thresholds, 21, 33, 48, 127, 129, 131, 135Diagnostics, 8, 12, 15, 47, 217, 306, 334, 351–352,

401, 411–412, 414–416, 418Difference

from control, 16, 137–138, 143, 411–412, 426test, 3, 5–6, 10, 16, 20, 22, 27, 67, 80–83, 85–87,

89, 94, 101–102, 105, 107–109, 118–119,219, 289, 306, 309, 313, 407, 412, 414–415,423, 487, 544–545

thresholds, 20–22, 27, 38, 126–128, 130, 150, 167,206, 272

Dilution, 27, 43, 130, 140–142, 196–197Dirichlet multinomial, 313, 496–499Discrete interval, 182Discriminant analysis, 438–439, 454, 458

Page 118: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

590 Subject Index

Discrimination test, 5–6, 8, 10, 16, 63–65, 74–75,79–99, 103, 110–111, 114, 116, 118–119,128–129, 150, 167, 244, 269–270, 306, 309,323, 425–426, 429, 458, 481, 490, 493, 499,503, 536–538, 544–546, 548–549

Discriminator, 94, 97, 103, 105–110, 118, 121, 314,511, 545–546

theory, 105–108Discussion guide, 384–385, 389–391, 394,

396–397Dispersion, 96–97, 99, 117, 193, 313, 490, 496–499Dominant participants, 233, 382, 394, 400Drivers of liking, 462–463Dumping, 198, 217–218, 222, 235, 368

effect, 217–218, 222, 235Duncan test, 513Duo-trio test, 67, 69, 80, 82, 84–85, 88–92, 94–97,

105, 545

EEffect size, 94, 104, 355, 538–542, 547, 549End anchors, 154–155, 159, 163–164, 210, 214, 221,

327, 330–332, 334, 516Equivalence, 12, 21–22, 101–122, 310, 415, 467,

503–504, 536–537, 542–545, 549Equivalence test, 101–122, 310Error of anticipation, 209, 218Error of habituation, 218Ethnography, 380, 383Experimental design, 2, 7, 9, 12, 15, 66–72, 112, 185,

213, 219, 253, 355, 389, 474, 481, 512,519–520, 522, 537

Expert judges, 6, 15, 408, 412, 419–422

FFacial scale, 153, 332–333FACT scale, 341“False close”, 391, 394Field

agencies, 353, 356, 359, 360–361, 369,453–454

services, 358–362, 453tests, 13, 304–306, 326, 329, 334, 349–369

Fixed effects, 517, 519Flavor

color interaction, 49–50detection threshold, methods, 129–133irritation and, 49release, 191–193, 195–196

Focusgroups, 358, 362, 364, 380–393, 395, 398–401,

453, 462–463panels, 380, 386

Forced choice, 16, 21, 27, 80–82, 85, 88–89, 93, 106,116–119, 127–135, 137, 139, 143, 150, 270,306, 315, 415, 423, 498, 504, 546

Frame of reference, 9, 11, 14, 162–164, 170, 172,204–206, 210, 215, 220–223, 229, 232, 238,270, 287, 315, 343, 354, 357, 364–365, 408,520–522

F-ratio, 187, 458, 475, 485, 508–513, 515, 517–519Free association, 387, 399Frequency effect, 210–212Frequency of usage, product, 308, 328–329Friedman test, 166, 252, 316–317, 501–503

GGCO, 140–141Generalization, 17, 228, 312, 397, 495Geometric mean, 129, 131–132, 134–135, 143,

157–158, 161, 173, 189–190, 288, 497GPA, 251, 439–440, 444, 446, 458Grounded theory, 384, 400Group interview, 362, 380–381, 383–384, 386–387,

389–391, 394, 398–400, 452–453Guessing model, 101, 106, 120, 545

HHalo effect, 216–217, 306, 340, 344, 366, 368Hazard functions, 426–427Hedonic

scales, 7–8, 110, 152–153, 166, 171–172, 196,326–328, 329–333, 341, 343, 356, 363, 421,426, 542–543

shifts, 208–209temporal aspects, 196–197

Home Use test, 10, 217, 313, 352, 354–355, 361, 382,402, 467–468

HUT, 352, 354–355, 468Hypothesis, 50, 80, 82–84, 86–88, 92–93, 102, 104,

107, 120, 122, 127, 187, 267, 307–308,320–321, 381, 395, 400, 414–415, 434, 455,473, 478–487, 490–491, 493, 495, 498, 500,508, 515, 519, 525–526, 535–548

II chart, 409Ideal point model, 464–465

Page 119: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Subject Index 591

Idiosyncratic scale usage, 212–213, 484Imax, 185–187, 189–190, 192, 436Incidence, 290, 357, 359–360, 424–425, 486Independent groups, 340, 482–484, 499–501, 503,

511, 538, 541–542Instrumental analysis, 413, 417Interaction, 8, 13, 16, 27, 30–32, 39, 47–50, 65, 72,

170, 187, 191, 193–194, 196, 205, 233, 243,245, 284, 287, 317, 336, 354, 356, 358, 363,380, 382–383, 388, 399–400, 439, 453, 508,515–519, 522

Internal test, 352Internet, 243, 359, 380, 398–399, 463Interval, 7, 22, 25, 46, 89, 106–111, 120–122, 128,

131, 134, 136, 138–139, 151–152, 154–155,166–167, 169–173, 180–184, 186–189, 194,196, 249, 270, 310, 312, 318, 322–323,327–328, 356, 364, 367, 412, 418, 422,425–426, 428, 436, 454–455, 457, 460, 468,478, 490, 493, 502–503, 526, 530–532, 537,543, 548–549

Interval testing, 110–111, 121, 549Interviews, 8, 72, 112, 329, 352, 355–356, 358,

361–365, 367, 369, 380–394, 398–399,452–453, 462

Irritation, 41, 43–45, 47, 49, 66, 128, 159, 194

JJAR

data, 336–339scale, 215, 334–336, 338, 340, 495–496

Judgment function, 150–151, 205Juiciness, 186, 195, 273–274, 444Just-about-right scales, 334–340, 416, 462, 497, 539Just noticeable difference (JND), 21, 128, 130,

166–167

KKano model, 421, 463–464Kinetics, 25–26, 427–428

LLabeled affective magnitude scale, 172, 213, 327,

330–331, 344Labeled magnitude scales, 158, 162–164, 172–174,

214, 221, 327–328, 331–332Laddering, 381–382, 390, 399LAM, 163–164, 170, 172–173, 213–214, 318, 327,

330–332

Likert scale, 154, 367Linear model, 462, 508, 515–516, 522, 526–527Line marking (line scaling), 152, 154–156, 162, 166,

212–214, 318, 328, 330Line scale, 137, 155, 158, 162, 168–169, 181–182,

184, 186, 196, 212, 214, 234, 242–243, 248,271, 318, 327–330, 334, 336, 339, 342, 357,435, 458, 511

LMS (labeled magnitude scale), 158, 162–164, 170,172–174, 214, 221, 327–328, 331–332

Local standing panels, 352, 355LSD test, 317, 501, 513–514

MMagnitude estimation, 23–25, 45, 150–152, 154,

156–158, 160–162, 165, 168–170, 173, 203,212–213, 221, 271–272, 288, 327–328,330–331, 364, 476

Management issues, 423–424Marketing research, 4, 13–14, 16–17, 169, 215, 251,

303, 308, 328, 348, 350–351, 358, 368–370,380–381, 385, 389, 392, 394, 396, 442,452–453, 542, 547–548

test, 13, 303, 358, 542McNemar test, 98–99, 338, 494–495, 503MDS, 434, 458–460Mean

drop, 339–340square, 245, 509–512, 518, 530–531

Meat, 13, 30, 58, 61, 64–65, 150, 180, 186, 194–195,197, 229, 231, 233, 238, 241, 265, 276, 283,390, 419, 464

Median, 129, 137, 151, 158, 172, 189, 197, 215, 248,427, 429, 475–476, 478, 490

Melting, 180, 186, 195, 197, 267, 342Method

of adjustment, 21–22, 208, 336of constant stimuli, 20–23, 127of limits, 21, 112, 125, 128–129, 133, 139

Mind-mapping, 398–399Mixture Suppression, 31–32, 34, 39–40, 205, 462Mode, 114, 151, 380, 475–476, 490Moderating, 384, 392–395, 452–453Moderator, 381–384, 388–401Monitoring, 12, 29, 74, 243–246, 353, 356, 360, 419,

422–423Monsanto, 265MTBE, 134, 143–144Multidimensional scaling, 254, 333, 442, 455

Page 120: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

592 Subject Index

Multinomial, 167, 313, 318, 322–323, 489,496–499

Multiple standards, 414–415Multivariate, 39, 119, 185, 234–235, 241, 244–246,

250, 275, 326, 369, 433–446, 453–458,522, 533

NNapping, 461–462Naturalistic observation, 383NBC, 467–468Negative skew, 211–212New Coke, 468New product development, 17, 364, 380–381, 387,

398, 454Niche, 304, 452–453Nine-point scale, 207Nominal, 151–152, 421, 490, 532Non-direction, 110, 480Non-forced preference, 311–315Nonparametric, 151–152, 252, 316–317, 337, 367,

489–504No-preference option, 305–306, 311–313, 319–323,

467, 496Normal approximation, 110, 310Normal distribution, 89–92, 110, 132, 136, 157,

171–172, 211–212, 235, 308–310, 360, 427,477–478, 490–493, 500, 522, 542–543

Null hypothesis, 80, 83–84, 86–87, 92–93, 102, 120,127, 307–308, 320, 479–487, 490–493, 495,498, 500, 508, 515, 519, 525–526, 535–537,539, 548

Number bias, 151, 212–213

OOdor

attributes, 47–48category, 39classification, 38–39-free, 40, 62–63, 130intensity, 38–39, 128, 141, 417, 538mixture, 38, 40–42, 156molecules, 37, 40–41, 156perception, 49, 240profile, 41properties, 45, 140quality, 36, 38–39, 41, 208–209strength, 20-taste, 48

threshold, 62, 87, 133–134, 137unit, 37, 140–141

Olfaction, 34, 36, 39–40, 49, 126, 135, 168Olfactory qualities, 13, 36, 38–41, 208–209Olive oil, 419One-on-one interviews, 380, 382–383, 386–387One tailed, 82, 84, 86, 88, 91–92, 95, 104, 107–111,

184, 309, 467, 480–481, 484, 487, 493, 495,499, 539, 543–545, 547

One-tailed test, 92, 110–111, 484, 487, 495, 543–544One-way mirror, 392, 394, 396, 401Open-ended questions, 351, 361–363, 367–368, 370OPUS scale, 163–164Oral

burn, 194cavity, 29, 36, 47, 264, 266–267chemical irritation, 44heat, 194sensation, 30, 163, 196, 214surface, 46-tactile, 262–266, 276tissue, 43, 45, 49

Order bias, 219Order effects, 72, 219–220, 357, 484Order of questions, 362–363Ordinal, 151–152, 166, 169, 270, 316, 339, 490,

499–500, 531–532Over-partitioning, 218

PPaired comparison, 5–6, 10, 20, 22, 66, 80–86, 89, 92,

95, 97, 110–111, 114–117, 119–120, 122,166–167, 262, 269, 306, 312, 429, 458, 468,490, 499, 513, 545–547

Panelistscreening, 423as their own controls, 512–513variation, 511, 517

Panel size, 7–8, 75, 104–105, 107–110, 119, 121, 247,327, 415–416, 428, 517, 537, 545

Panel training, 36, 74, 153, 186, 198, 207, 231, 238,243, 250, 271, 413, 416, 419, 428, 536

Paradox of the discriminatory nondiscriminators, 118Parity, 110–111, 122, 168, 310, 323, 468, 491, 495,

541, 545Partial least squares, 434, 445, 458Participant selection, 350–351Partitioning, 11, 218, 484, 509–512, 517, 522, 536

Page 121: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Subject Index 593

PCA, 192–193, 197, 233, 245–246, 433–440, 442,445–446, 455–458, 461–462,464–465, 531

Penalty analysis, 335, 339–340, 462Pepper, 20, 37, 41–43, 49, 66, 128, 140, 142, 155,

158, 160, 162–163, 182, 193–194, 196, 230,239, 292, 483–484

Perceptual map, 251, 333, 440–442, 444, 446, 452,454–458, 461, 464–465

Persistence, 184, 193Phase change, 195, 264, 267, 428Pictorial scale, 332–333Placebo designs, 321–322Planned comparisons, 513–514PLS, 445, 45820-Point rating system, 421Population, 3, 5, 9, 10Positional bias, 219Positive skew, 129, 212Power, 3, 7–8, 21, 23–25, 38, 45, 64, 66, 68, 73, 80,

89, 92–94, 97–99, 102, 104–105, 109–110,118, 120–122, 161, 167, 212, 312, 355–356,415, 480–482, 484–485, 500, 522, 531,535–548

Power function, 23–25, 38, 45, 161, 212Practice, 7, 12–13, 49, 57–76, 86, 94–95, 126–130,

132, 136, 141–143, 151–152, 154,156–160, 163, 165, 169, 185–186,188–190, 197–198, 214, 217, 219–221,223, 228, 240–241, 267, 271, 294, 305–306,320, 330, 333, 344, 358, 363, 383–384,388, 392, 412, 417–419, 421, 442, 460, 532,536–537

Precision, 2–3, 10, 17, 142, 230, 232, 240, 248, 326,330, 455

Preference map, 318, 326, 434, 440–445, 455,463–465

Preference tests, 66–67, 69, 98, 110–111, 128, 219,303–323, 325–327, 333, 352, 355, 467–468,489, 491–493, 496, 498, 545–548

Principal component analysis, 45, 230, 233, 235, 243,245, 250–251, 342, 434–437, 446,456–457, 531

Probing, 362–363, 365, 368, 380–383, 386, 388–390,393–394, 400, 452–453

Probit analysis, 136Procrustes, 250–252, 434, 439–441, 446, 458Product prototype, 303, 350, 380, 385, 387, 390,

452–453

Profiling, 38, 43, 156–157, 165, 184, 186, 192,217–218, 230–231, 249–253, 262, 268, 387,439, 442, 444, 460

Program development, 411, 422–424Projective mapping, 461–462Projective techniques, 400Proportions, 3, 5, 10, 23, 87, 89, 95–96, 98, 106,

108–110, 114–115, 134, 136, 152, 158, 161,166–167, 169, 171–172, 184, 306, 309,312–315, 321–323, 330, 337, 426, 475, 477,479, 487, 489–493, 497–499, 503, 546–547

Psychoanalysis, 400Psychophysics, 11, 20–23, 26–27, 49, 111–112, 116,

125–128, 151, 159, 169P-value, 89, 110, 122, 245, 310, 461, 477, 485–487,

510, 522, 537, 539

QQ10 models, 430–431Qualitative research, 368, 380–384, 387, 401, 452Quality

control, 4–5, 12, 16–17, 101–102, 104, 112, 121,165, 219, 228, 358, 361, 407–431, 451, 486,536, 538

rating, 412, 414, 416scoring, 171, 412, 419–422

Quartermaster Corp, 8, 206Quartermaster Food and Container Institute, 12, 326Questionnaires, 9–10, 70, 130, 168, 217, 305, 314,

327, 329, 339–342, 344, 349–369, 370–377,380–381, 383, 385–386, 388, 402, 453–454

RRandom effects, 517–520Randomization, 58, 66, 69, 71–72, 130, 219–221, 329Range effect, 164, 210–211, 214–215, 220Rank

order, 25, 74, 151, 165, 266, 327, 364, 429, 458,475–476, 484, 490, 494, 499–503, 510–514,516, 532

tests, 499–503rating, 165, 333–334

Rated degree of difference, 121, 137, 166Rated degree of preference, 318–320Rating, 23, 66, 70, 119–120, 126, 140, 142, 152,

154–158, 161, 164–166, 169, 171–173, 180,184, 190, 197, 203, 205–213, 216–218, 221,232, 237, 249, 270–271, 284, 303, 318–319,327, 333–334, 339, 341, 366, 368, 386, 410,

Page 122: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

594 Subject Index

Rating (cont.)413–416, 421, 430, 457–458, 462, 475, 479,489, 490, 500, 511, 513, 515–516, 539

Ratio scales, 25, 162, 169, 318, 331R chart, 409–410Receptors, 27–32, 34–36, 44–45, 66, 128, 141, 150,

196, 209, 216, 285–286Recognition Thresholds, 127, 130, 142Recording, 68, 154, 181, 183, 195, 383, 389,

391–392, 397Recruitment, 58, 74–76, 351, 353, 356, 359–360, 384,

401, 428Rejection threshold, 128, 130, 306Relative-to-ideal, 335–336Relative scales, 165, 328Reliability, 3, 10, 12, 140, 143, 168, 230, 248,

326–327, 330, 382, 384–385, 392, 408, 418,422, 455–456, 460, 486

Repeated measures, 3, 71, 136, 213, 356, 484, 503,510, 513–515, 517, 520–522, 536

Replicate, 5, 70–71, 86, 94–97, 105, 135, 143, 168,186, 211, 234, 244, 247–248, 313–314, 318,320–321, 328, 417, 424, 486, 496–499,516–521, 536, 548

Research suppliers, 360–362Response restriction, 212, 216–218R-index, 119–120

SSalty, 29–31, 127, 154, 160, 162, 195, 209, 239, 334,

340, 402, 413, 461, 500Sample, 2–3, 5–7, 10, 12, 14–15, 21, 36, 46, 58,

61–72, 76, 79–90, 92–95, 97–99, 102–109,111, 118, 120–122, 128–135, 137–140, 142,150, 153, 156–158, 162, 165–167, 171, 173,180–181, 183–184, 186–187, 191, 193–194,204, 206–207, 209–212, 216, 218–221, 223,228–230, 232–241, 243–244, 248–252, 262,265–272, 274–276, 286, 289–293, 296–298,306–312, 314–317, 319–322, 326–330, 337,341, 351–352, 354–357, 359, 361, 370–371,374, 391, 398, 402–403, 411, 416, 429,436–437, 446, 475–482, 494–495, 499–500,542, 544, 547

size, 5, 15, 58, 63–64, 81, 93, 97, 104–111,120–121, 237, 265–266, 274, 308–310, 312,319, 322, 328, 355–356, 361, 436, 467–468,475, 480–481, 485, 500–501, 533, 536–538,539–540, 541–542, 544–548

Satisfaction, 289, 304, 351, 363, 367, 408, 463–464Scale usage, 140, 156, 160, 212–213, 221, 231, 328,

357, 484, 513, 520Scaling, 8, 23–25, 27, 48, 74, 80, 89, 103, 116–119,

137–141, 149–174, 180–183, 188, 193–194,197–198, 210, 212–215, 217, 220–222, 262,267, 270, 288, 304–305, 317–318, 323,325–328, 330–336, 341, 364, 434, 442, 444,454–455, 458, 461, 465, 476, 490

Scoville units, 142Screening questionnaire, 359, 371–373Segments, consumer, 168, 330, 340, 445, 453, 457,

465–467Sensory adaptation, 19, 30, 126, 129, 135–137, 205,

207, 209Sensory interactions, 13, 47Sensory segmentation, 466Shelf life, 3, 11–12, 17, 88, 101–102, 104, 165, 228,

248, 252, 343, 360, 391, 407–431, 451, 474,479, 486, 519, 521, 535–536

Signal detection, 88, 103, 111–120, 129, 135, 143,312, 538, 546

theory, 111–116, 119, 143, 538Sign test, 499–503Similarity test, 80, 108–109, 120–121SLIM scale, 163Smell, 2, 9, 11, 13, 20, 21, 23, 29–30, 34–41, 46–49,

126–127, 129–130, 134–135, 137–141,162–163, 205–207, 209, 216–217, 219, 229,231, 242, 249–250, 342, 429, 457

Smiley scale, 333SMURF, 181, 184Sorting, 79, 81, 87, 135, 251, 267, 396–397, 458–461Sour, 9, 29–32, 36, 43–45, 48–49, 166, 185, 193, 195,

209, 239, 243, 249, 335–336, 340, 416, 436,462

Specification, 8, 12, 16, 83, 126, 143, 180, 217, 228,249, 358–361, 370, 408–410, 413, 419, 423,425, 427–428, 454, 522, 540

Split-plot, 71, 352, 520Sriwatanakul, K., 156S-R model, 205Stability testing, 407–431Staircase procedure, 139Standard deviation, 10, 22, 37, 104, 113–114, 117,

120, 122, 135–136, 166–168, 171–172, 189,223, 240, 243, 315, 355–356, 410, 427–428,475–480, 483, 487, 490, 500, 508–509, 528,536–543, 546, 548

Page 123: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

Subject Index 595

Standard error, 108–110, 134, 310, 409, 419, 478,482–483, 492, 514, 536, 539–540, 543–544

Statistical analysis, 7, 9, 68, 85, 89, 150–151, 157,183, 185–186, 191, 234, 305, 368, 370, 382,408, 413, 417, 419–420, 453, 473–474, 513,522, 531, 533

Statistical inference, 15, 385, 403, 478–479Statistics, 2–3, 5, 80, 86, 103, 109, 117, 138, 152, 155,

166, 169, 222, 235, 312, 327–328, 339, 369,424, 433, 445, 473–474, 476, 477–479,481–482, 484–487, 489, 490, 494, 499–500,503, 507–508, 514, 519, 522, 527, 533,536, 538

Stopwatch, 183, 186Storage, 3, 61–62, 64, 155, 289, 343, 360, 383, 408,

412, 422, 425–426, 428, 430“Strangers on a train” phenomenon, 384Strategic research, 111, 451–469Stuart-Maxwell test, 314, 338, 495–497Student, 478Sums of squares, 509–511, 516–517, 530Superiority, 12, 88, 310, 312, 369, 452, 467Survival analysis, 426–427Sweet, 28–29, 30–32, 36, 38, 43, 47–49, 65, 82, 103,

111, 128, 149, 153–154, 156–157, 159–160,169, 185, 193, 209, 216–218, 230, 239, 242,249, 314, 334–337, 339–340, 354, 416, 436,442, 457, 466, 479, 515

Sweeteners, 30, 32, 128, 180–181, 193, 195, 514–515

TTactile, 6–7, 20, 27, 29–30, 33, 41–42, 44–46, 87,

168, 194–195, 206–207, 210, 230, 259–264,266, 268–269, 276, 316, 387

Taste, 2, 4, 6, 9, 11, 13–14, 20–23, 25–34, 36, 39,41–49, 62, 65, 82–88, 99, 103, 112, 119,126–131, 133–135, 137–140, 143, 156–157,160–167, 179–181, 183–184, 186, 193–198,205–209, 212–214, 216–217, 219, 222–223,229, 231–233, 235, 237, 239, 242, 249–250,261, 284, 304, 307, 311, 316, 318, 327,329–331, 333, 336, 340, 343, 354, 356–357,363, 366, 368, 374, 387–388, 391, 403,417–418, 421, 425, 429, 459–461, 483,495, 500

qualities, 30–31, 47, 127TDS, 184–185, 249Telephone interview, 362, 398–399Temporal dominance, 184, 249

Tenderness, 186, 195, 229, 275–276, 444Test

power, 3, 7–8, 66, 68, 92–93, 98, 102, 104, 110,120, 355, 415, 480, 482–483, 485, 501,535–537, 540, 542, 545, 547, 549

specification, 12, 16, 358–359, 361Texture, 6–7, 9–10, 20, 27, 49, 63, 65, 71, 82, 160,

179–180, 182–184, 186, 189–190, 192–193,195–198, 216, 221, 227, 229–230, 235,237–239, 259–277, 284, 286–287, 298, 304,336, 342–343, 354, 356–357, 363, 366, 368,412–413, 417, 420, 423, 456, 460–461, 502,516, 522

Thermal, 30, 39, 41, 43, 45, 206, 267–268Thresholds, 20–22, 27, 30, 33, 36–38, 43, 48, 62,

66, 87, 125–143, 150, 167, 218, 272,287, 475

Thurstonianmodel, 99, 101, 118, 120, 312, 546scaling, 8, 89, 103, 116, 152, 166

Time-intensity, 43, 46, 66, 156, 179–198, 213, 217,249, 267

scaling, 156, 182, 193–194, 212, 217Tmax, 185–187, 189–190TOST, 110–111, 121, 274Total duration, 180, 185–187, 192–193Training, 1, 3–4, 6, 9–10, 13, 15, 36, 39, 46, 48,

57–58, 73–75, 81, 85–86, 153, 159,185–186, 190, 197–198, 207, 214, 216, 218,220–221, 228–235, 238–244, 247–252, 261,264, 270–272, 287–288, 306, 335, 351, 355,358, 381, 392, 409, 411–413, 416–419,422–423, 428–429, 452, 460, 503, 536

Transcribing, 396Transcript analysis, 396Trapezoid method, 187, 192Triangle test, 5, 10, 48, 63, 66–67, 69, 80, 83–84, 87,

90, 92–95, 97, 104–109, 111, 118–119, 121,131, 133–134, 142, 150, 166, 219, 244, 292,414, 423, 429, 487, 489–490, 492–493, 498,544–547, 561, 566, 568

Trigeminal, 28, 33, 36, 41–42, 44–45, 49, 128, 194T-test, 10, 101, 110, 121–122, 138, 143, 216, 319,

337, 340, 356–357, 412, 437, 475, 479–484,487, 489, 499–501, 503, 510–511, 513, 536,538–539, 541–542, 547

Tukey test, 513–514Two on-sided tests, 110Two-tailed test, 111

Page 124: AppendixA BasicStatisticalConceptsforSensoryEvaluation978-1-4419-6488-5/1.pdf · AppendixA BasicStatisticalConceptsforSensoryEvaluation Contents A.1 Introduction ... analysis and

596 Subject Index

Type I error, 3, 65, 92–93, 102, 409, 437, 481–482,486, 513, 519, 522, 535, 537, 539, 546

Type II error, 3, 65, 80, 92–93, 101–104, 109, 168,409, 424–425, 438, 468, 482, 487, 536–537,546–547

Type zero error, 380

UUmami, 28–30, 42U-test, 499–501, 503

VValidity, 1, 3, 10, 12, 72, 161, 168–170, 191–192, 240,

327, 332, 354, 365, 382–385, 397, 455–456Variability, 3, 8, 22, 28, 34, 65, 102–103, 107–108,

111–112, 118, 120, 126, 135–137, 143,166–168, 170–172, 221, 249, 276, 315, 321,355–357, 408–409, 411, 413–415, 419, 441,474–476, 484, 487, 508, 535–536, 538, 546

Variance, 3, 7, 10, 38, 113, 119, 121, 152, 155, 166,168, 171, 183, 187, 197, 213, 233–235, 241,243–246, 250, 252, 260, 315, 317–319, 321,328, 339, 356, 412, 434–438, 440–441,443–444, 446, 455–456, 458–459, 461, 463,474–476, 479, 482, 484, 497–498, 500–501,503, 507–515, 517–519, 521–522, 526,528–531, 536, 539, 569–571

Vector model, 442, 444–445, 463–465Verbatim comments, 130–131, 368, 388–389, 392,

395–397

Video, 350, 380, 383, 388–389, 392, 397, 399Vienna Philharmonic, 468Vision, 20, 126, 205–206, 229, 259, 283, 285–286,

289, 292–293Visual, 11, 22–23, 27–29, 34, 38, 49–50, 58, 60–61,

63, 87, 112, 138, 155, 157, 165, 168, 181,184–186, 189, 204, 207, 210, 212, 235, 238,242, 259–262, 270, 276, 285, 287–289,292–293, 298–299, 316, 329, 361–362, 364,387, 399, 538

Visual analogue scales, 155, 329Vocabulary development, 240

WWeber, 20, 38, 45, 125, 206, 272Weibull distribution, 427Wine aroma wheel, 39–40Wine scoring, 420, 422Within-subjects, 337, 520

XX-bar chart, 409–410

ZZ-score, 95, 104, 109–110, 114–115, 117, 120, 134,

136, 157, 166–167, 171–173, 309–310, 312,356, 414, 428–430, 477–479, 491, 500,537, 543