evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

1/8


2/8

Evaluating Rating Scales for

Sensory Testing with Children

Sensory testing with children is becoming increasingly importantto the food industry, but little research on appropriate

methodology has been conducted

AS THE NUMBER of food products

aimed at the children's market increases

and the role of children in purchasedecisions expands, sensory testing with

children becomes increasingly impor-

tant to the food processing industry.

However, sensory research has not kept

pace with this need.Testing with children is in an embry-

onic stage. Over the years, a few

sensory researchers have considered the problems involved in applying their

science to this special population, but

for the most part the field has been

static. The need for serious investiga-

tion is pointed up by how little researchhas been done in this area.

As a way of focusing on the specific

needs for this kind of research, athumbnail sketch of certain key

questions the literature considers is presented in the box on p. 80.One thing is very noticeable not only

in the literature, but also in word-of-

mouth, unpublished material about

children's testing. The methods used

have been intuitive, even granted thatthe investigator may have had a

rationale. Once a method has been

selected, there has been no serious

investigation of possible alternatives. It

is as if the researchers said, "We planned this, we tried it, it seemed to

work, and there was no time to botherwith what might have worked better.”

We therefore undertook a basic

research project designed to help

establish a solid foundation for future

investigations. This article describes the

procedures, analysis, and conclusions ofresearch intended to evaluate the

relative merit of rating scales that might

be used when testing with children. In

this study, we used two methods of

questioning – one-on-one interviewing(Fig. 1) and self-administered question-

naire (Fig. 2) – and three types of ratingscale (Fig. 3).

Beverley J. Kroll

Variables SelectedA great many variables could be

considered. Hence, it was necessary to

be selective and try to choose the more

important ones. Test Products. The test product

was not really a source of variation, butremained constant throughout the main

series of experiments. We settled on a

sweetness difference in an orange

drink. One can reliably predict that

children will like a sweeter drink, at

least within the normal range. This

proved to be the case.Preliminary testing of drinks with

various sweetness differences indicated

the adjustments needed. For example, a

drink sweetened with the recommended

amount of sugar compared to one

made with only 50% of that amount

produced highly significant differences

no matter what rating scale was used.

Needed was a difference that was

definite but not overwhelming, so that

the possible effects of the variations of

interest could emerge. The final choice

was an orange- flavored drink sweet-

ened with the recommended amount of

sugar, compared to a drink with 80% of

that amount. Scale Type. Differences in scale

type were the main issue addressed inthese experiments. After preliminary

work with older children, we concen-

trated on three scale types (Fig. 3) – the

standard hedonic scale with the usual

verbal categories, a pictorial or face

scale, and a child-oriented verbal scale

we developed.Over the years, researchers have

investigated test language suitable for

children. After reviewing child-

The author is President, Peryam & Kroll,

Marketing and Sensory Research, 6323 N.Avondale Ave., Suite 121, Chicago, IL 60631

oriented word scales designed by

others, we decided to develop our own

scale, with more nearly equal intervals

(although exact equality probably

cannot be achieved with scales of thistype). The result was dubbed the

Peryam &

Scale Length. There is a school of

thought, bolstered by intuition, thatlonger scales tend to create confusion

because there are lots of words tounderstand and choices to make. The

implication is that this problem should

be more serious with younger children.

On the other hand, there is evidence

that longer scales can be more discrimi-nating and produce more reliable

results.Certainly, this factor was of enough

importance to be included in the study.

Starting with the frequently used 9

points, how far down

Kroll or P&K scale.It was imperative that the study

include a picture scale. Testing withchildren is overrun with picture scales,

the rationale being that younger people

may not understand words and phrases but can more accurately deal with facial

expressions. Besides, pictures are

entertaining and should inspire closer

attention to the task.There are many such caricature scales

around, but all have the same general

characteristics, representing degrees of

pleasantness ranging from high to low.The question is how well successive

pictures communicate the basic idea.

Some preliminary work was done with

a scale from an earlier published study,

which used the Snoopy cartooncharacter, but the results were disap-

pointing. Scales using children's faces

with variations in degree of detail were

also tried. Eventually a series of

simplified people faces was selected as probably best and certainly representa-

tive.

– Text continued on page 80

FOOD TECHNOLOGY


3/8

Fig. 1 – Children Ages 5 – 7 and 8 – 10 were tested using one-on-one interviews

Traditional hedonic scale Face scale

Like extremely Super good

Like very much Really good

Like moderately Good

Like slightly Just a little good

Neither like nor dislike Maybe good or maybe bad

Dislike slightly

Dislike moderately

Just a little bad

Bad

Dislike very much Really bad

Dislike extremely Super bad

Fig. 2 – Children Ages 8 – 10 were also

tested using self-administered question-

naires in standard sensory testing booths

Fig. 3 – Three Types of Rating Scale Were Used: the traditional hedonic scale, the

P&K scale developed for this study, and the typical face scale. After testing, scale

values of 1 to 9 were assigned (starting with 1 at the top) for the purposes of

FOOD TECHNOLOGY 79

P&K scale


4/8

Evaluating Rating Scales (continued)

should one go? To 7 points? 5? 3? Oreven to just 2 points, which would be

paired comparison?The study addressed this variable in

subdued fashion by trying 7 points, usingthe same three scale types as before buteliminating one good category and one

bad category from each scale. Age. For what ages might special

techniques be required? Our initial workwas with children over 10 years of age,most of whom seemed to handle self-administered questionnaires fairly well,with no problems that are not encoun-tered to some degree with adults.

To address the real issue, therefore,we defined two age groups based on

suppositions about ability to handleverbal input: the preliterate, ages 5 – 7,where most can be expected to read verylittle if at all and not understand bigwords; and the semiliterate, ages 8 – 10,where most can read at some level butstill may not understand words such as"extremely" or "moderately." No at-

tempt was made to extend the investiga-tion to preschoolers.

Mode of Presentation. Most of theexperiments employed a straightforwardapproach, where the successive catego-

ries were read one after another, alwaysstarting at the good end.

Another approach sometimes used byinvestigators is what may be called"bifurcated" – the interviewer first asks

the subject to place the stimulus intoeither the good/ like or the bad/dislikecategory, then tries to get the child tosca le degree of l ike or d is l ike by

presenting the successive categories. The

categories were presented starting in themiddle and proceeding to the ends. Thisseemed logical, but that could be open todebate. If the subject failed to make achoice in response to the initial question,the result was recorded as "maybegood/maybe bad" or "neither like nordislike" (but was not read to the subject).

This phase of testing included only thehedonic and P&K scales because theface scale is inappropriate to thisapproach.

The question of which was the better procedure – the b i furca ted or thestraightforward – was addressed in a sideexperiment.

– Text continued on page 82

Questions Addressed in Earlier Studies

Can children discriminate? How far down the age scale does thecapacity for discrimination exist?

There has never been much argument here. Children can definitelydiscriminate. At least they have preferences. Observations of the behavior of even infants indicate the capability of choice in terms ofrejection and acceptance.

About 1955, investigators at Eli Lilly, Inc., developed a procedurefor working with children 2 – 3 years old to evaluate formulas forvitamin preparations (Peryam, 1989). They used one-on-one inter-viewing and the paired-comparison method and claimed to haveobtained results useful in product development.

Investigators at the University of Florida did extensive testing ofvarious citrus products with preschool children 6ges 3 – 5 (Morse,1953). They found lots of discrimination, as well as puzzling aberrations.They used, and endorsed, paired comparisons, which produced the onlymeaningful results. However, they also tried a method which wasessentially the triangle test, although not labeled as such. Their conclu-sion that the method was too complicated for kids should not surpriseanyone.

Work with preschoolers ages 3 – 5 used fruit as stimuli and aninteresting variation of the rank-order method (Birch, 1979). The c

Can one use a measuring device more sophisticated than simple pairedcomparisons? Can children differentiate degrees of liking and/ordisliking?

Usually investigators have found that children do have such ability, butthe extent of that ability, as well as how it might be affected by any oneof many variables, is seldom considered.

Some years ago, Bert Krieger, a researcher with a candy manufac-turer, was faced with the problem of evaluating formulation changes inchocolate bars (Moskowitz, 1985). He dealt with children 5 – 7 years oldas well as older children, using a picture scale that showed the Snoopycartoon character in a series of nine poses ranging from up-eared elationto droopy disgust. His subjects were able to discriminate.

Another researcher (Wells, 1965) used a scaling method to evaluatechildren's feelings about cereals. He was not concerned with the foods aseaten, but evaluated children's ideas about familiar cereals and theirfeelings about TV commercials. Some of the subjects were in the 5 – 7age range. The study used 7-point face scales showing a youngster (a boy for boys, a girl for girls) in poses ranging from grinning happiness tohold-the-nose distaste. The children could discriminate, and the resultswere meaningful.Are the results of testing children useful in solving typical productdevelopment problems?

The sponsors must be getting something useful, or why would so much

be attempted? Some of the published studies actually address thequestion, e.g., the previously cited work by Krieger, who achievedcomparative evaluation of formulas for chocolate bars.

SummaryBriefly summarizing the literature, we note that:

There is consensus that children can discriminate, particularly inregard to degree of liking.

Children are able to show degree of preference if the propermeasuring device is used.

Children can provide useful information about products if the rightmethods are employed.

Children require special handling, i.e., handling that is differentfrom the procedures routinely employed with adults. One must payattention to such things as gaining confidence, providing motivation, andexpressing tasks in language children understand. This recognitionappears throughout the literature.

hildwas presented with a number of different kinds of fruit and asked toselect the one liked best. This was then removed, and the one liked bestamong the rest was chosen, and so on. Whatever the utility of the

findings, there was discrimination, which replicate testing showed wasreliable.Colwill (1987) reviewed scaling methods for obtaining information

about consumers' likes and dislikes. He recommended using picturescales, preferably with five or seven points, for testing preliteratechildren.

80 FOOD TECHNOLOGY


5/8


Another side issue that seemed worth

testing was one-on-one interviewing vs

a self-administered questionnaire. This

experiment used the 9-point hedonic

scale and P&K scales and involved only

children 8 – 10 years old, i .e., the

semiliterate group. Again, the face scale

was excluded because the concern was

ma in ly wi th ab i l i t y t o r ead wi thunderstanding.

Testing ProcedureThe test subjects were prerecruited

from families on our extensive roster of

consumer pane l i s t s . Usua l ly , t he

computer knows which families have

children and their ages. All had to like

orange drinks, which was no problem.

Otherwise the only concern was age,

sex, and availabil i ty to fi t into the

schedule. An important proviso was that

no child should be invited to participatein more than one test, which would

raise questions about training effect.In all cases, a subject tried the pair of

samples, high sweet vs low sweet ,

twice, using a different scale for each

pa ir , t hen made a pa ired-comparison

choice after each pair. Except for those

o n t h e m o d e o f p r e se n t a t i o n, t h e

experiments included all three scale

types – hedonic, P&K, and face. The

design required that the scales be used

equally often and appear equally oftenas the first or second pair. Furthermore,

for each scale type the high-sweet and

low- sweet samples were served first or

second equally often.Sex differences did not seem impor-

tant in the context of this investigation,

bu t our recr ui te rs at te mpted t o h ave

equal numbers of girls and boys in each

o f t h e a g e g r o u p s . T h i s w a s n o t

achieved exactly, but it was close. They

also tried to get an even distribution of

ages within each age group. Again, this

was not exact but was very close.The drinks were prepared in quantity

ahead of time, chilled to refrigerator

temperature, and held at that tempera-

ture throughout testing. They were

poured just before serving. A sample as

served was about 1% oz of drink in a

small plastic glass. The samples were

identified by code number, but only for

the convenience of the operators and to

avoid errors. If a subject even saw the

codes, it was accidental.

All interviewing was conducted one-on-one, except for the sessions

using the regular written questionnaires.

The interviewers were carefully briefed

on the protocol to be followed for each

variation.The interviewer met the subject and

parent in a reception area. Leaving the

parent there, the interviewer took the

child to the testing area while chatting

in a friendly manner to establish rapportand relieve possible tension. The test

itself was not discussed except in a very

general way.In the test room, the child was seated

at a table across from the interviewer

(Fig. 1) and told that he or she would

get some samples of orange drink and

would be asked questions about them.

The first sample was brought and the

child invited to try it. When the child

was finished, the interviewer began the

questioning procedure according to the

set protocol. After a rating was made,

the child was told to drink some water

while the interviewer got the next

sample. The waiting period was about 2

minutes. The second sample of the pair

was then tr ied and rated. This was

followed by the question, "Which did

you like better, the first sample you tried

or the second one?”Then the child was told there were

more drinks to be tried and had a drink

of wa te r whi l e wa i t ing ano the r 2

minutes. The second pair was handledlike the first, and the child was escorted

back to hi s o r h er pa rent . The whole

sequence took about 10 minutes.

AnalysesThere is a qualification to note here.

Some findings, in the sense of the

objectives of the research, rely on what

may be called soft data; however, they

were derived from hard data. H a r d D a t a . F o r t h e p a i r e d

comparison, the significance of the

proportions of choice was determined

by the z-test. For the scalar measures,

the significance of the difference

between the average rating for the high-

swee t and low-swee t d r inks was

determined using the t-by-difference

test, which was natural, since each

subject had tried both samples. Using

the variances of the distributions was

also considered, but the figures were

volatile and hard to interpret. With

scales of this kind, the variance is

highly dependent on the average rating,

being quite low when the upper end ofthe scale is approached, but increasing

as the ave rage d rops toward the

midpoint. Soft Data. The tables of results

show significance levels ranging from

1 % t o 1 5 % . T h e se f i g u r e s we r e

compared among scales, between age

groups, between test orders, between

orders of serving, and so on.

How legitimate, or how useful, is thisapproach? There is no routine, accepted

statistical procedure for determining

whether one level of significance is or

is not significantly different from

another. Perhaps a method for this

purpose could be devised, but i ts

possible ut i l izat ion has not been

explored. An example of the questions

to be resolved would be, how much

more important is the 1 % level than the

2% level? Probably not very important,

since both are near certainty. But one is

easily convinced that the 1% level

shows more discrimination than the

10% level. These are the kinds of

decisions that served as the basis for

most of the conclusions in this study.

ResultsWhat, if anything, was discovered in

th i s s tudy? Are any conc lus ions

definitive, settling certain points once

and for all? Not likely! But there are

results that can direct future research on

the subject.

Paired-Comparison. The pairedcomparisons were always made after

the pair of drinks had been presented

and rated. The results, summarized

across all tests, are shown in Table l.Overall, there was a highly significant

difference – well below the 0.1% level

– which was due in part to the large

number of subjects (N). As expected,

the high-sweet sample was preferred,

which validated the product variable.

Other conclusions come from compar-

ing different subgroups.Test order, whether the first or second

pair of the session, made no difference.There was no difference in discrimi-

nation between boys and girls.Ch i ld ren 8 – 10 yea r s o ld were

definitely more discriminating than the

younger kids, who failed to establish a

significant difference. Their failure

might have been due to interference by

the scaling task. The difference between

ages might have been expected.Scale type may also have made a

di f ference , a l though evidence i s borderline. When the comparison was

82 FOOD TECHNOLOGY


6/8


made after the hedonic and P&K scales,

discrimination was about the same as

overall; but when it was made after the

face scale, it dropped to the level of

nonsignificance. This might be a chance

effect, or there may be something about

the face scale which later interfered with

the paired comparison.

Scale Length. Scale-length results(Table 2) tend to lay to rest the belief

tha t chi ldren need s impl ic i ty and

shouldn't be presented with too much

because they will get con- fused. Within

the context of these experiments, that

did not prove to be the case. Quite the

contrary – the 9-point scales were as

good, if not better, than the 7-point

versions. Definitely, the 7-point scales

were not better. Whether the 9-point

scales were actually better for discrimi-

nation rests on comparison of the 5% vs1% levels of significance, but the 7-

point scales offer no advantage.With the 9-point scales, all subgroups

showed significant discrimination,

granted that at one point it dropped to a

questionable 15% level; whereas with

the 7-point scales, three subgroups

showed nonsignificance.The boys did slightly better than the

girls, although this was not consistent. It

is probably trivial, and not indicative of

any meaningful trend.

This result is definite and hardlyunexpected. The children 8 – 10 years

old showed good discrimination with

both scale lengths, whereas the children

5 – 7 years old showed s ignificant

discrimination only with the 9-point

scales, completely failing the task with

the shorter version. On the basis of the

supposition that the simpler scales

should be easier for younger children,

one might have expected this to be the

other way around.

It is often noted in sequential monadictesting that there is better discrimination

when only the second-served samples

are considered. In this study, there was

significant discrimination with the

second-served samples for both scale

lengths, but almost none with the first-

served samples. Is this due to some kind

of contrast?Is it a training effect, where the ratings

of the second sample have the benefit of

experience with the first? This research

could not address such questions in all

of their complexity. Besides, such effects

face scale, which typified the kind

alleged to be better for children, failed

to emerge as better than the otherscales.In a way, Table 3 i s repet i t i ve ,

exhibiting effects shown in the other

tables, but now separately for each

scale type. However, it may add further

emphasis to the following conclusions:

The P&K scale gave better overall

discrimination; older children showed

better discrimination with all scales;

and no scale discriminated when just

the first-served samples were consid-

ered, but the P&K and face scales did

with the second-served samples.

84 FOOD TECHNOLOGY – Continued on page 86

pe rt ain t o a ll te st ing, no t j us t when

children are concerned.

S c a le Ty p e . T h e c r u x o f t h eresearch is the comparative evaluation

of the three scale types. Overall, with

N = 208 for each scale , al l scales

significantly discriminated at better than

the 10% level. However, the P&K scale

(1% significance level) was better than

the hedonic scale (8% significance

level) and the face scale (7% signifi-

cance l eve l ) . We th ink th i s i s an

important finding, but remember the

qualification about soft data – it is based

on comparison of the 1% vs the 7% or

8% level of significance. In addition, the


7/8


The second pair of drinks tested was

consistently better for discrimination

than the first pair, no matter the scale

type. Does this mean that there is a

learning effect, even from the brief first

exposure to the task? If so, it is both bad

news and good news. The bad news is

that one does not have a pure measure.

But who believes that is possibleanyway? The good news is that kids

quickly learn to do a good job, and that

the testing of multiple pairs is accept-

able. Mode of Presentation. Table 4

shows the results of the side study

designed to help answer the question, Is

there any advantage in using the two-

stage, bifurcated approach? The study

was limited to the 9-point scale.Overall, the bifurcated approach

seems to offer no advantage over the

straightforward. Even for the children 5

– 7 years old – the age group for whom

the method was de- s igned – the

bifurcated scale was little better than

the straightforward approach.The self-administration phase of the

study was an embellishment done as an

afterthought. It was limited in scope,

utilizing only the hedonic and P&K

scales, and excluding children 5 – 7

years old for the obvious reason that

they are preliterate.

The results (Table 5) showed thatchildren 8 – 10 years old can handle

written questionnaires effectively.

Overall, the results were significant at

the 1% level.Although not shown in the table, the

effect of self-administration was more

pronounced wi th the hedonic scale,

whereas discrimination with the P&K

scale was about the same with both

approaches (one-on-one interviewing

and self-administration). This finding

should cheer sensory specialists. It

makes things easier. If children of thisage are sufficiently knowledgeable that

big words do not defeat the purpose,

why bother with expensive one-on-one

interviewing?

Further Studies Needed

The resul t s of th is s tudy can be

summarized as follows: The P&K scale

performs better than the hedonic or face

scale. Reducing scale length from 9 po in t s t o 7 o f fe r s no advan tage .

Children 5 – 7 years old do not perform

any better with the face scale than with

the other two scales. The bifurcated

approach does not discriminate as well

as the straightforward method. And

older children perform as well using

wr i t t e n q u e s t i o n n a i r e s a s wh e n

interviewed one-on-one.The study, as noted earlier, was not

intended to be the be all and end all.

Rather, it was intended as a foundation

for further studies. A re- view of

variables will show that many need

further at tention. While there are

problems involved, there is a great deal

to be obtained.

ReferencesBirch, L.L. 1979. Dimensions of preschool

children's food preferences. J. Nutr. Educ. 2(2):

77.C o lwi l l , J . S . 1987 . S e nso r y a na ly s i s by

consumers. Food Mfr., Feb., p. 53.Morse, R.L.D., 1953. Exploratory studies of

preschool children's taste discr imination and

preference for selected juices. Proc. of Florida

State Horticultural Soc., Daytona Beach.Moskowitz, H.R. 1985. Product testing with

children. In "New Direction for Product Testing

and Sensory Analysis of Foods," p. 147. Food

and Nutrition Press, Inc., Westport, Conn.Peryam, D.R. 1989. Personal communication.

Peryam & Kroll Marketing and Sensory

Research, Chicago.Wells, W.D. 1965. Communicating with children.

J. Adv. Res., p. 2.

Based on a paper presented at the Spring

Meeting of ASTM, San Francisco, Calif, May 24,1990.

– Edi ted by Neil H. Mermelstein, Senior Associate Editor

Reprinted from Food Technology 44(11) 78-80, 82, 84, & 86

1990 Institute of Food Technologists©

86 FOOD TECHNOLOGY


8/8

Peryam & Kroll has set up inGreater Los Angeles with all thesensory facilities, marketingresources, years of expertise andqualified staff you thought you

would never find anywhere butat their Metropolitan Chicagoheadquarters.

In fact, the West Coast Divisionalready has a data base thatincludes many thousands ofpeople with special demo-Graphic characteristics - ethnicbackground, economicstandards, non-traditional

The Greater Los Angeles andMetropolitan Chicago officesserve clients nationwide. They both deliver quality research workand sophisticated project reports

quickly and economically. So,you can select a location on the basis of test demographics, proposed market or personalconvenience, and know you arestill getting the comprehensiveservices you need to bring products successfully fromconcept to regional distribution ornational rollout.

To contact P&K’s GreaterLos Angeles office directly, callJackie Beckley at:

Peryam & Kroll

West Coast Division4175 East La PalmaAnaheim, California 92807tel: 714-572-6888fax: 714-572-6808

MARKETING & SENSORY RESEARCH

METROPOLITAN CHICAGO ANDGREATER LOS ANGELES1-800-74-PKLAB

Peryam & Kroll goes West Coast!

evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

Documents