evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
TRANSCRIPT
-
8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
1/8
-
8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
2/8
Evaluating Rating Scales for
Sensory Testing with Children
Sensory testing with children is becoming increasingly importantto the food industry, but little research on appropriate
methodology has been conducted
AS THE NUMBER of food products
aimed at the children's market increases
and the role of children in purchasedecisions expands, sensory testing with
children becomes increasingly impor-
tant to the food processing industry.
However, sensory research has not kept
pace with this need.Testing with children is in an embry-
onic stage. Over the years, a few
sensory researchers have considered the problems involved in applying their
science to this special population, but
for the most part the field has been
static. The need for serious investiga-
tion is pointed up by how little researchhas been done in this area.
As a way of focusing on the specific
needs for this kind of research, athumbnail sketch of certain key
questions the literature considers is presented in the box on p. 80.One thing is very noticeable not only
in the literature, but also in word-of-
mouth, unpublished material about
children's testing. The methods used
have been intuitive, even granted thatthe investigator may have had a
rationale. Once a method has been
selected, there has been no serious
investigation of possible alternatives. It
is as if the researchers said, "We planned this, we tried it, it seemed to
work, and there was no time to botherwith what might have worked better.”
We therefore undertook a basic
research project designed to help
establish a solid foundation for future
investigations. This article describes the
procedures, analysis, and conclusions ofresearch intended to evaluate the
relative merit of rating scales that might
be used when testing with children. In
this study, we used two methods of
questioning – one-on-one interviewing(Fig. 1) and self-administered question-
naire (Fig. 2) – and three types of ratingscale (Fig. 3).
Beverley J. Kroll
Variables SelectedA great many variables could be
considered. Hence, it was necessary to
be selective and try to choose the more
important ones. Test Products. The test product
was not really a source of variation, butremained constant throughout the main
series of experiments. We settled on a
sweetness difference in an orange
drink. One can reliably predict that
children will like a sweeter drink, at
least within the normal range. This
proved to be the case.Preliminary testing of drinks with
various sweetness differences indicated
the adjustments needed. For example, a
drink sweetened with the recommended
amount of sugar com- pared to one
made with only 50% of that amount
produced highly significant differences
no matter what rating scale was used.
Needed was a difference that was
definite but not overwhelming, so that
the possible effects of the variations of
interest could emerge. The final choice
was an orange- flavored drink sweet-
ened with the recommended amount of
sugar, compared to a drink with 80% of
that amount. Scale Type. Differences in scale
type were the main issue ad- dressed inthese experiments. After preliminary
work with older children, we concen-
trated on three scale types (Fig. 3) – the
standard hedonic scale with the usual
verbal categories, a pictorial or face
scale, and a child-oriented verbal scale
we developed.Over the years, researchers have
investigated test language suitable for
children. After reviewing child-
The author is President, Peryam & Kroll,
Marketing and Sensory Research, 6323 N.Avondale Ave., Suite 121, Chicago, IL 60631
oriented word scales designed by
others, we decided to develop our own
scale, with more nearly equal intervals
(although exact equality probably
cannot be achieved with scales of thistype). The result was dubbed the
Peryam &
Scale Length. There is a school of
thought, bolstered by intuition, thatlonger scales tend to create confusion
because there are lots of words tounderstand and choices to make. The
implication is that this problem should
be more serious with younger children.
On the other hand, there is evidence
that longer scales can be more discrimi-nating and produce more reliable
results.Certainly, this factor was of enough
importance to be included in the study.
Starting with the frequently used 9
points, how far down
Kroll or P&K scale.It was imperative that the study
include a picture scale. Testing withchildren is overrun with picture scales,
the rationale being that younger people
may not understand words and phrases but can more accurately deal with facial
expressions. Besides, pictures are
entertaining and should inspire closer
attention to the task.There are many such caricature scales
around, but all have the same general
characteristics, representing degrees of
pleasantness ranging from high to low.The question is how well successive
pictures communicate the basic idea.
Some preliminary work was done with
a scale from an earlier published study,
which used the Snoopy cartooncharacter, but the results were disap-
pointing. Scales using children's faces
with variations in degree of detail were
also tried. Eventually a series of
simplified people faces was selected as probably best and certainly representa-
tive.
– Text continued on page 80
FOOD TECHNOLOGY
-
8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
3/8
Fig. 1 – Children Ages 5 – 7 and 8 – 10 were tested using one-on-one interviews
Traditional hedonic scale Face scale
Like extremely Super good
Like very much Really good
Like moderately Good
Like slightly Just a little good
Neither like nor dislike Maybe good or maybe bad
Dislike slightly
Dislike moderately
Just a little bad
Bad
Dislike very much Really bad
Dislike extremely Super bad
Fig. 2 – Children Ages 8 – 10 were also
tested using self-administered question-
naires in standard sensory testing booths
Fig. 3 – Three Types of Rating Scale Were Used: the traditional hedonic scale, the
P&K scale developed for this study, and the typical face scale. After testing, scale
values of 1 to 9 were assigned (starting with 1 at the top) for the purposes of
FOOD TECHNOLOGY 79
P&K scale
-
8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
4/8
Evaluating Rating Scales (continued)
should one go? To 7 points? 5? 3? Oreven to just 2 points, which would be
paired comparison?The study addressed this variable in
subdued fashion by trying 7 points, usingthe same three scale types as before buteliminating one good category and one
bad category from each scale. Age. For what ages might special
techniques be required? Our initial workwas with children over 10 years of age,most of whom seemed to handle self-administered questionnaires fairly well,with no problems that are not encoun-tered to some degree with adults.
To address the real issue, there- fore,we defined two age groups based on
suppositions about ability to handleverbal input: the preliterate, ages 5 – 7,where most can be expected to read verylittle if at all and not understand bigwords; and the semiliterate, ages 8 – 10,where most can read at some level butstill may not understand words such as"extremely" or "moderately." No at-
tempt was made to extend the investiga-tion to preschoolers.
Mode of Presentation. Most of theexperiments employed a straightforwardapproach, where the successive catego-
ries were read one after another, alwaysstarting at the good end.
Another approach sometimes used byinvestigators is what may be called"bifurcated" – the interviewer first asks
the subject to place the stimulus intoeither the good/ like or the bad/dislikecategory, then tries to get the child tosca le degree of l ike or d is l ike by
presenting the successive categories. The
categories were presented starting in themiddle and proceeding to the ends. Thisseemed logical, but that could be open todebate. If the subject failed to make achoice in response to the initial question,the result was recorded as "maybegood/maybe bad" or "neither like nordislike" (but was not read to the subject).
This phase of testing included only thehedonic and P&K scales because theface scale is inappropriate to thisapproach.
The question of which was the better procedure – the b i furca ted or thestraightforward – was addressed in a sideexperiment.
– Text continued on page 82
Questions Addressed in Earlier Studies
Can children discriminate? How far down the age scale does thecapacity for discrimination exist?
There has never been much argument here. Children can definitelydiscriminate. At least they have preferences. Observations of the behavior of even infants indicate the capability of choice in terms ofrejection and acceptance.
About 1955, investigators at Eli Lilly, Inc., developed a procedurefor working with children 2 – 3 years old to evaluate formulas forvitamin preparations (Peryam, 1989). They used one-on-one inter-viewing and the paired-comparison method and claimed to haveobtained results useful in product development.
Investigators at the University of Florida did extensive testing ofvarious citrus products with preschool children 6ges 3 – 5 (Morse,1953). They found lots of discrimination, as well as puzzling aberrations.They used, and endorsed, paired comparisons, which produced the onlymeaningful results. However, they also tried a method which wasessentially the triangle test, although not labeled as such. Their conclu-sion that the method was too complicated for kids should not surpriseanyone.
Work with preschoolers ages 3 – 5 used fruit as stimuli and aninteresting variation of the rank-order method (Birch, 1979). The c
Can one use a measuring device more sophisticated than simple pairedcomparisons? Can children differentiate degrees of liking and/ordisliking?
Usually investigators have found that children do have such ability, butthe extent of that ability, as well as how it might be affected by any oneof many variables, is seldom considered.
Some years ago, Bert Krieger, a researcher with a candy manufac-turer, was faced with the problem of evaluating formulation changes inchocolate bars (Moskowitz, 1985). He dealt with children 5 – 7 years oldas well as older children, using a picture scale that showed the Snoopycartoon character in a series of nine poses ranging from up-eared elationto droopy disgust. His subjects were able to discriminate.
Another researcher (Wells, 1965) used a scaling method to evaluatechildren's feelings about cereals. He was not concerned with the foods aseaten, but evaluated children's ideas about familiar cereals and theirfeelings about TV commercials. Some of the subjects were in the 5 – 7age range. The study used 7-point face scales showing a youngster (a boy for boys, a girl for girls) in poses ranging from grinning happiness tohold-the-nose distaste. The children could discriminate, and the resultswere meaningful.Are the results of testing children useful in solving typical productdevelopment problems?
The sponsors must be getting something useful, or why would so much
be attempted? Some of the published studies actually address thequestion, e.g., the previously cited work by Krieger, who achievedcomparative evaluation of formulas for chocolate bars.
SummaryBriefly summarizing the literature, we note that:
There is consensus that children can discriminate, particularly inregard to degree of liking.
Children are able to show degree of preference if the propermeasuring device is used.
Children can provide useful information about products if the rightmethods are employed.
Children require special handling, i.e., handling that is differentfrom the procedures routinely employed with adults. One must payattention to such things as gaining confidence, providing motivation, andexpressing tasks in language children understand. This recognitionappears throughout the literature.
hildwas presented with a number of different kinds of fruit and asked toselect the one liked best. This was then removed, and the one liked bestamong the rest was chosen, and so on. Whatever the utility of the
findings, there was discrimination, which replicate testing showed wasreliable.Colwill (1987) reviewed scaling methods for obtaining information
about consumers' likes and dislikes. He recommended using picturescales, preferably with five or seven points, for testing preliteratechildren.
80 FOOD TECHNOLOGY
-
8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
5/8
Evaluating Rating Scales (continued)
Another side issue that seemed worth
testing was one-on-one interviewing vs
a self-administered questionnaire. This
experiment used the 9-point hedonic
scale and P&K scales and involved only
children 8 – 10 years old, i .e., the
semiliterate group. Again, the face scale
was excluded because the concern was
ma in ly wi th ab i l i t y t o r ead wi thunderstanding.
Testing ProcedureThe test subjects were prerecruited
from families on our extensive roster of
consumer pane l i s t s . Usua l ly , t he
computer knows which families have
children and their ages. All had to like
orange drinks, which was no problem.
Otherwise the only concern was age,
sex, and availabil i ty to fi t into the
schedule. An important proviso was that
no child should be invited to participatein more than one test, which would
raise questions about training effect.In all cases, a subject tried the pair of
samples, high sweet vs low sweet ,
twice, using a different scale for each
pa ir , t hen made a pa ired-comparison
choice after each pair. Except for those
o n t h e m o d e o f p r e se n t a t i o n, t h e
experiments included all three scale
types – hedonic, P&K, and face. The
design required that the scales be used
equally often and appear equally oftenas the first or second pair. Furthermore,
for each scale type the high-sweet and
low- sweet samples were served first or
second equally often.Sex differences did not seem impor-
tant in the context of this investigation,
bu t our recr ui te rs at te mpted t o h ave
equal numbers of girls and boys in each
o f t h e a g e g r o u p s . T h i s w a s n o t
achieved exactly, but it was close. They
also tried to get an even distribution of
ages within each age group. Again, this
was not exact but was very close.The drinks were prepared in quantity
ahead of time, chilled to refrigerator
temperature, and held at that tempera-
ture throughout testing. They were
poured just before serving. A sample as
served was about 1% oz of drink in a
small plastic glass. The samples were
identified by code number, but only for
the convenience of the operators and to
avoid errors. If a subject even saw the
codes, it was accidental.
All interviewing was conducted one-on-one, except for the sessions
using the regular written questionnaires.
The interviewers were carefully briefed
on the protocol to be followed for each
variation.The interviewer met the subject and
parent in a reception area. Leaving the
parent there, the interviewer took the
child to the testing area while chatting
in a friendly manner to establish rapportand relieve possible tension. The test
itself was not discussed except in a very
general way.In the test room, the child was seated
at a table across from the interviewer
(Fig. 1) and told that he or she would
get some samples of orange drink and
would be asked questions about them.
The first sample was brought and the
child invited to try it. When the child
was finished, the interviewer began the
questioning procedure according to the
set protocol. After a rating was made,
the child was told to drink some water
while the interviewer got the next
sample. The waiting period was about 2
minutes. The second sample of the pair
was then tr ied and rated. This was
followed by the question, "Which did
you like better, the first sample you tried
or the second one?”Then the child was told there were
more drinks to be tried and had a drink
of wa te r whi l e wa i t ing ano the r 2
minutes. The second pair was handledlike the first, and the child was escorted
back to hi s o r h er pa rent . The whole
sequence took about 10 minutes.
AnalysesThere is a qualification to note here.
Some findings, in the sense of the
objectives of the research, rely on what
may be called soft data; however, they
were derived from hard data. H a r d D a t a . F o r t h e p a i r e d
comparison, the significance of the
proportions of choice was deter- mined
by the z-test. For the scalar measures,
the significance of the difference
between the average rating for the high-
swee t and low-swee t d r inks was
determined using the t-by-difference
test, which was natural, since each
subject had tried both samples. Using
the variances of the distributions was
also considered, but the figures were
volatile and hard to interpret. With
scales of this kind, the variance is
highly dependent on the average rating,
being quite low when the upper end ofthe scale is approached, but increasing
as the ave rage d rops toward the
midpoint. Soft Data. The tables of results
show significance levels ranging from
1 % t o 1 5 % . T h e se f i g u r e s we r e
compared among scales, between age
groups, between test orders, between
orders of serving, and so on.
How legitimate, or how useful, is thisapproach? There is no routine, accepted
statistical procedure for determining
whether one level of significance is or
is not significantly different from
another. Perhaps a method for this
purpose could be devised, but i ts
possible ut i l izat ion has not been
explored. An example of the questions
to be resolved would be, how much
more important is the 1 % level than the
2% level? Probably not very important,
since both are near certainty. But one is
easily convinced that the 1% level
shows more discrimination than the
10% level. These are the kinds of
decisions that served as the basis for
most of the conclusions in this study.
ResultsWhat, if anything, was discovered in
th i s s tudy? Are any conc lus ions
definitive, settling certain points once
and for all? Not likely! But there are
results that can direct future research on
the subject.
Paired-Comparison. The pairedcomparisons were always made after
the pair of drinks had been presented
and rated. The results, summarized
across all tests, are shown in Table l.Overall, there was a highly significant
difference – well below the 0.1% level
– which was due in part to the large
number of subjects (N). As expected,
the high-sweet sample was preferred,
which validated the product variable.
Other conclusions come from compar-
ing different subgroups.Test order, whether the first or second
pair of the session, made no difference.There was no difference in discrimi-
nation between boys and girls.Ch i ld ren 8 – 10 yea r s o ld were
definitely more discriminating than the
younger kids, who failed to establish a
significant difference. Their failure
might have been due to interference by
the scaling task. The difference between
ages might have been expected.Scale type may also have made a
di f ference , a l though evidence i s borderline. When the comparison was
82 FOOD TECHNOLOGY
-
8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
6/8
Evaluating Rating Scales (continued)
made after the hedonic and P&K scales,
discrimination was about the same as
overall; but when it was made after the
face scale, it dropped to the level of
nonsignificance. This might be a chance
effect, or there may be something about
the face scale which later interfered with
the paired comparison.
Scale Length. Scale-length results(Table 2) tend to lay to rest the belief
tha t chi ldren need s impl ic i ty and
shouldn't be presented with too much
because they will get con- fused. Within
the context of these experiments, that
did not prove to be the case. Quite the
contrary – the 9-point scales were as
good, if not better, than the 7-point
versions. Definitely, the 7-point scales
were not better. Whether the 9-point
scales were actually better for discrimi-
nation rests on comparison of the 5% vs1% levels of significance, but the 7-
point scales offer no advantage.With the 9-point scales, all sub- groups
showed significant discrimination,
granted that at one point it dropped to a
questionable 15% level; whereas with
the 7-point scales, three subgroups
showed nonsignificance.The boys did slightly better than the
girls, although this was not consistent. It
is probably trivial, and not indicative of
any meaningful trend.
This result is definite and hardlyunexpected. The children 8 – 10 years
old showed good discrimination with
both scale lengths, whereas the children
5 – 7 years old showed s ignificant
discrimination only with the 9-point
scales, completely failing the task with
the shorter version. On the basis of the
supposition that the simpler scales
should be easier for younger children,
one might have expected this to be the
other way around.
It is often noted in sequential monadictesting that there is better discrimination
when only the second-served samples
are considered. In this study, there was
significant discrimination with the
second-served samples for both scale
lengths, but almost none with the first-
served samples. Is this due to some kind
of contrast?Is it a training effect, where the ratings
of the second sample have the benefit of
experience with the first? This research
could not address such questions in all
of their complexity. Besides, such effects
face scale, which typified the kind
alleged to be better for children, failed
to emerge as better than the otherscales.In a way, Table 3 i s repet i t i ve ,
exhibiting effects shown in the other
tables, but now separately for each
scale type. However, it may add further
emphasis to the following conclusions:
The P&K scale gave better overall
discrimination; older children showed
better discrimination with all scales;
and no scale discriminated when just
the first-served samples were consid-
ered, but the P&K and face scales did
with the second-served samples.
84 FOOD TECHNOLOGY – Continued on page 86
pe rt ain t o a ll te st ing, no t j us t when
children are concerned.
S c a le Ty p e . T h e c r u x o f t h eresearch is the comparative evaluation
of the three scale types. Overall, with
N = 208 for each scale , al l scales
significantly discriminated at better than
the 10% level. However, the P&K scale
(1% significance level) was better than
the hedonic scale (8% significance
level) and the face scale (7% signifi-
cance l eve l ) . We th ink th i s i s an
important finding, but remember the
qualification about soft data – it is based
on comparison of the 1% vs the 7% or
8% level of significance. In addition, the
-
8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
7/8
Evaluating Rating Scales (continued)
The second pair of drinks tested was
consistently better for discrimination
than the first pair, no matter the scale
type. Does this mean that there is a
learning effect, even from the brief first
exposure to the task? If so, it is both bad
news and good news. The bad news is
that one does not have a pure measure.
But who believes that is possibleanyway? The good news is that kids
quickly learn to do a good job, and that
the testing of multiple pairs is accept-
able. Mode of Presentation. Table 4
shows the results of the side study
designed to help answer the question, Is
there any advantage in using the two-
stage, bifurcated approach? The study
was limited to the 9-point scale.Overall, the bifurcated approach
seems to offer no advantage over the
straightforward. Even for the children 5
– 7 years old – the age group for whom
the method was de- s igned – the
bifurcated scale was little better than
the straightforward approach.The self-administration phase of the
study was an embellishment done as an
afterthought. It was limited in scope,
utilizing only the hedonic and P&K
scales, and excluding children 5 – 7
years old for the obvious reason that
they are preliterate.
The results (Table 5) showed thatchildren 8 – 10 years old can handle
written questionnaires effectively.
Overall, the results were significant at
the 1% level.Although not shown in the table, the
effect of self-administration was more
pronounced wi th the hedonic scale,
whereas discrimination with the P&K
scale was about the same with both
approaches (one-on-one interviewing
and self-administration). This finding
should cheer sensory specialists. It
makes things easier. If children of thisage are sufficiently knowledgeable that
big words do not defeat the purpose,
why bother with expensive one-on-one
interviewing?
Further Studies Needed
The resul t s of th is s tudy can be
summarized as follows: The P&K scale
performs better than the hedonic or face
scale. Reducing scale length from 9 po in t s t o 7 o f fe r s no advan tage .
Children 5 – 7 years old do not perform
any better with the face scale than with
the other two scales. The bifurcated
approach does not discriminate as well
as the straightforward method. And
older children perform as well using
wr i t t e n q u e s t i o n n a i r e s a s wh e n
interviewed one-on-one.The study, as noted earlier, was not
intended to be the be all and end all.
Rather, it was intended as a foundation
for further studies. A re- view of
variables will show that many need
further at tention. While there are
problems involved, there is a great deal
to be obtained.
ReferencesBirch, L.L. 1979. Dimensions of preschool
children's food preferences. J. Nutr. Educ. 2(2):
77.C o lwi l l , J . S . 1987 . S e nso r y a na ly s i s by
consumers. Food Mfr., Feb., p. 53.Morse, R.L.D., 1953. Exploratory studies of
preschool children's taste discr imination and
preference for selected juices. Proc. of Florida
State Horticultural Soc., Daytona Beach.Moskowitz, H.R. 1985. Product testing with
children. In "New Direction for Product Testing
and Sensory Analysis of Foods," p. 147. Food
and Nutrition Press, Inc., Westport, Conn.Peryam, D.R. 1989. Personal communication.
Peryam & Kroll Marketing and Sensory
Research, Chicago.Wells, W.D. 1965. Communicating with children.
J. Adv. Res., p. 2.
Based on a paper presented at the Spring
Meeting of ASTM, San Francisco, Calif, May 24,1990.
– Edi ted by Neil H. Mermelstein, Senior Associate Editor
Reprinted from Food Technology 44(11) 78-80, 82, 84, & 86
1990 Institute of Food Technologists©
86 FOOD TECHNOLOGY
-
8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology
8/8
Peryam & Kroll has set up inGreater Los Angeles with all thesensory facilities, marketingresources, years of expertise andqualified staff you thought you
would never find anywhere butat their Metropolitan Chicagoheadquarters.
In fact, the West Coast Divisionalready has a data base thatincludes many thousands ofpeople with special demo-Graphic characteristics - ethnicbackground, economicstandards, non-traditional
The Greater Los Angeles andMetropolitan Chicago officesserve clients nationwide. They both deliver quality research workand sophisticated project reports
quickly and economically. So,you can select a location on the basis of test demographics, proposed market or personalconvenience, and know you arestill getting the comprehensiveservices you need to bring products successfully fromconcept to regional distribution ornational rollout.
To contact P&K’s GreaterLos Angeles office directly, callJackie Beckley at:
Peryam & Kroll
West Coast Division4175 East La PalmaAnaheim, California 92807tel: 714-572-6888fax: 714-572-6808
MARKETING & SENSORY RESEARCH
METROPOLITAN CHICAGO ANDGREATER LOS ANGELES1-800-74-PKLAB
Peryam & Kroll goes West Coast!