evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

Upload: henny-barutu

Post on 01-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

    1/8

  • 8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

    2/8

    Evaluating Rating Scales for

    Sensory Testing with Children

    Sensory testing with children is becoming increasingly importantto the food industry, but little research on appropriate

    methodology has been conducted

     AS THE NUMBER of food products

    aimed at the children's market increases

    and the role of children in purchasedecisions expands, sensory testing with

    children becomes increasingly impor-

    tant to the food processing industry.

    However, sensory research has not kept

     pace with this need.Testing with children is in an embry-

    onic stage. Over the years, a few

    sensory researchers have considered the problems involved in applying their

    science to this special population, but

    for the most part the field has been

    static. The need for serious investiga-

    tion is pointed up by how little researchhas been done in this area.

    As a way of focusing on the specific

    needs for this kind of research, athumbnail sketch of certain key

    questions the literature considers is presented in the box on p. 80.One thing is very noticeable not only

    in the literature, but also in word-of-

    mouth, unpublished material about

    children's testing. The methods used

    have been intuitive, even granted thatthe investigator may have had a

    rationale. Once a method has been

    selected, there has been no serious

    investigation of possible alternatives. It

    is as if the researchers said, "We planned this, we tried it, it seemed to

    work, and there was no time to botherwith what might have worked better.”

    We therefore undertook a basic

    research project designed to help

    establish a solid foundation for future

    investigations. This article describes the

     procedures, analysis, and conclusions ofresearch intended to evaluate the

    relative merit of rating scales that might

     be used when testing with children. In

    this study, we used two methods of

    questioning – one-on-one interviewing(Fig. 1) and self-administered question-

    naire (Fig. 2) – and three types of ratingscale (Fig. 3).

    Beverley J. Kroll

    Variables SelectedA great many variables could be

    considered. Hence, it was necessary to

     be selective and try to choose the more

    important ones. Test Products. The test product

    was not really a source of variation, butremained constant throughout the main

    series of experiments. We settled on a

    sweetness difference in an orange

    drink. One can reliably predict that

    children will like a sweeter drink, at

    least within the normal range. This

     proved to be the case.Preliminary testing of drinks with

    various sweetness differences indicated

    the adjustments needed. For example, a

    drink sweetened with the recommended

    amount of sugar com- pared to one

    made with only 50% of that amount

     produced highly significant differences

    no matter what rating scale was used.

     Needed was a difference that was

    definite but not overwhelming, so that

    the possible effects of the variations of

    interest could emerge. The final choice

    was an orange- flavored drink sweet-

    ened with the recommended amount of

    sugar, compared to a drink with 80% of

    that amount. Scale Type. Differences in scale

    type were the main issue ad- dressed inthese experiments. After preliminary

    work with older children, we concen-

    trated on three scale types (Fig. 3) – the

    standard hedonic scale with the usual

    verbal categories, a pictorial or face

    scale, and a child-oriented verbal scale

    we developed.Over the years, researchers have

    investigated test language suitable for

    children. After reviewing child-

    The author is President, Peryam & Kroll,

    Marketing and Sensory Research, 6323 N.Avondale Ave., Suite 121, Chicago, IL 60631

    oriented word scales designed by

    others, we decided to develop our own

    scale, with more nearly equal intervals

    (although exact equality probably

    cannot be achieved with scales of thistype). The result was dubbed the

    Peryam &

     Scale Length. There is a school of

    thought, bolstered by intuition, thatlonger scales tend to create confusion

     because there are lots of words tounderstand and choices to make. The

    implication is that this problem should

     be more serious with younger children.

    On the other hand, there is evidence

    that longer scales can be more discrimi-nating and produce more reliable

    results.Certainly, this factor was of enough

    importance to be included in the study.

    Starting with the frequently used 9

     points, how far down

     Kroll or P&K scale.It was imperative that the study

    include a picture scale. Testing withchildren is overrun with picture scales,

    the rationale being that younger people

    may not understand words and phrases but can more accurately deal with facial

    expressions. Besides, pictures are

    entertaining and should inspire closer

    attention to the task.There are many such caricature scales

    around, but all have the same general

    characteristics, representing degrees of

     pleasantness ranging from high to low.The question is how well successive

     pictures communicate the basic idea.

    Some preliminary work was done with

    a scale from an earlier published study,

    which used the Snoopy cartooncharacter, but the results were disap-

     pointing. Scales using children's faces

    with variations in degree of detail were

    also tried. Eventually a series of

    simplified people faces was selected as probably best and certainly representa-

    tive.

     – Text continued on page 80

    FOOD TECHNOLOGY

  • 8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

    3/8

    Fig. 1 – Children Ages 5 – 7 and 8 – 10 were tested using one-on-one interviews

    Traditional hedonic scale Face scale

    Like extremely Super good

    Like very much Really good

    Like moderately Good

    Like slightly Just a little good

    Neither like nor dislike Maybe good or maybe bad

    Dislike slightly

    Dislike moderately

    Just a little bad

    Bad

    Dislike very much Really bad

    Dislike extremely Super bad

    Fig. 2 – Children Ages 8 – 10  were also

    tested using self-administered question-

    naires in standard sensory testing booths

    Fig. 3 – Three Types of Rating Scale Were Used:  the traditional hedonic scale, the

    P&K scale developed for this study, and the typical face scale. After testing, scale

    values of 1 to 9 were assigned (starting with 1 at the top) for the purposes of

    FOOD TECHNOLOGY 79

    P&K scale

  • 8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

    4/8

    Evaluating Rating Scales  (continued)

    should one go? To 7 points? 5? 3? Oreven to just 2 points, which would be

     paired comparison?The study addressed this variable in

    subdued fashion by trying 7 points, usingthe same three scale types as before buteliminating one good category and one

     bad category from each scale. Age. For what ages might special

    techniques be required? Our initial workwas with children over 10 years of age,most of whom seemed to handle self-administered questionnaires fairly well,with no problems that are not encoun-tered to some degree with adults.

    To address the real issue, there- fore,we defined two age groups based on

    suppositions about ability to handleverbal input: the preliterate, ages 5 – 7,where most can be expected to read verylittle if at all and not understand bigwords; and the semiliterate, ages 8 – 10,where most can read at some level butstill may not understand words such as"extremely" or "moderately." No at-

    tempt was made to extend the investiga-tion to preschoolers.

     Mode of Presentation. Most of theexperiments employed a straightforwardapproach, where the successive catego-

    ries were read one after another, alwaysstarting at the good end.

    Another approach sometimes used byinvestigators is what may be called"bifurcated" – the interviewer first asks

    the subject to place the stimulus intoeither the good/ like or the bad/dislikecategory, then tries to get the child tosca le degree of l ike or d is l ike by

     presenting the successive categories. The

    categories were presented starting in themiddle and proceeding to the ends. Thisseemed logical, but that could be open todebate. If the subject failed to make achoice in response to the initial question,the result was recorded as "maybegood/maybe bad" or "neither like nordislike" (but was not read to the subject).

    This phase of testing included only thehedonic and P&K scales because theface scale is inappropriate to thisapproach.

    The question of which was the better procedure – the b i furca ted or thestraightforward – was addressed in a sideexperiment.

     – Text continued on page 82

    Questions Addressed in Earlier Studies

    Can children discriminate? How far down the age scale does thecapacity for discrimination exist?

    There has never been much argument here. Children can definitelydiscriminate. At least they have preferences. Observations of the behavior of even infants indicate the capability of choice in terms ofrejection and acceptance.

     About 1955, investigators at Eli Lilly, Inc., developed a procedurefor working with children 2 – 3 years old to evaluate formulas forvitamin preparations (Peryam, 1989). They used one-on-one inter-viewing and the paired-comparison method and claimed to haveobtained results useful in product development.

     Investigators at the University of Florida did extensive testing ofvarious citrus products with preschool children 6ges 3 – 5 (Morse,1953). They found lots of discrimination, as well as puzzling aberrations.They used, and endorsed, paired comparisons, which produced the onlymeaningful results. However, they also tried a method which wasessentially the triangle test, although not labeled as such. Their conclu-sion that the method was too complicated for kids should not surpriseanyone.

     Work with preschoolers ages 3 – 5 used fruit as stimuli and aninteresting variation of the rank-order method (Birch, 1979). The c

     

    Can one use a measuring device more sophisticated than simple pairedcomparisons? Can children differentiate degrees of liking and/ordisliking?

    Usually investigators have found that children do have such ability, butthe extent of that ability, as well as how it might be affected by any oneof many variables, is seldom considered.

     Some years ago, Bert Krieger, a researcher with a candy manufac-turer, was faced with the problem of evaluating formulation changes inchocolate bars (Moskowitz, 1985). He dealt with children 5 – 7 years oldas well as older children, using a picture scale that showed the Snoopycartoon character in a series of nine poses ranging from up-eared elationto droopy disgust. His subjects were able to discriminate.

     Another researcher (Wells, 1965) used a scaling method to evaluatechildren's feelings about cereals. He was not concerned with the foods aseaten, but evaluated children's ideas about familiar cereals and theirfeelings about TV commercials. Some of the subjects were in the 5 – 7age range. The study used 7-point face scales showing a youngster (a boy for boys, a girl for girls) in poses ranging from grinning happiness tohold-the-nose distaste. The children could discriminate, and the resultswere meaningful.Are the results of testing children useful in solving typical productdevelopment problems?

    The sponsors must be getting something useful, or why would so much

     be attempted? Some of the published studies actually address thequestion, e.g., the previously cited work by Krieger, who achievedcomparative evaluation of formulas for chocolate bars.

    SummaryBriefly summarizing the literature, we note that:

     There is consensus that children can discriminate, particularly inregard to degree of liking.

     Children are able to show degree of preference if the propermeasuring device is used.

     Children can provide useful information about products if the rightmethods are employed.

     Children require special handling, i.e., handling that is differentfrom the procedures routinely employed with adults. One must payattention to such things as gaining confidence, providing motivation, andexpressing tasks in language children understand. This recognitionappears throughout the literature.

    hildwas presented with a number of different kinds of fruit and asked toselect the one liked best. This was then removed, and the one liked bestamong the rest was chosen, and so on. Whatever the utility of the

    findings, there was discrimination, which replicate testing showed wasreliable.Colwill (1987) reviewed scaling methods for obtaining information

    about consumers' likes and dislikes. He recommended using picturescales, preferably with five or seven points, for testing preliteratechildren.

    80 FOOD TECHNOLOGY

  • 8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

    5/8

    Evaluating Rating Scales  (continued)

    Another side issue that seemed worth

    testing was one-on-one interviewing vs

    a self-administered questionnaire. This

    experiment used the 9-point hedonic

    scale and P&K scales and involved only

    children 8 – 10 years old, i .e., the

    semiliterate group. Again, the face scale

    was excluded because the concern was

    ma in ly wi th ab i l i t y t o r ead wi thunderstanding.

    Testing ProcedureThe test subjects were prerecruited

    from families on our extensive roster of

    consumer pane l i s t s . Usua l ly , t he

    computer knows which families have

    children and their ages. All had to like

    orange drinks, which was no problem.

    Otherwise the only concern was age,

    sex, and availabil i ty to fi t into the

    schedule. An important proviso was that

    no child should be invited to participatein more than one test, which would

    raise questions about training effect.In all cases, a subject tried the pair of

    samples, high sweet vs low sweet ,

    twice, using a different scale for each

     pa ir , t hen made a pa ired-comparison

    choice after each pair. Except for those

    o n t h e m o d e o f p r e se n t a t i o n, t h e

    experiments included all three scale

    types – hedonic, P&K, and face. The

    design required that the scales be used

    equally often and appear equally oftenas the first or second pair. Furthermore,

    for each scale type the high-sweet and

    low- sweet samples were served first or

    second equally often.Sex differences did not seem impor-

    tant in the context of this investigation,

     bu t our recr ui te rs at te mpted t o h ave

    equal numbers of girls and boys in each

    o f t h e a g e g r o u p s . T h i s w a s n o t

    achieved exactly, but it was close. They

    also tried to get an even distribution of

    ages within each age group. Again, this

    was not exact but was very close.The drinks were prepared in quantity

    ahead of time, chilled to refrigerator

    temperature, and held at that tempera-

    ture throughout testing. They were

     poured just before serving. A sample as

    served was about 1% oz of drink in a

    small plastic glass. The samples were

    identified by code number, but only for

    the convenience of the operators and to

    avoid errors. If a subject even saw the

    codes, it was accidental.

    All interviewing was conducted one-on-one, except for the sessions

    using the regular written questionnaires.

    The interviewers were carefully briefed

    on the protocol to be followed for each

    variation.The interviewer met the subject and

     parent in a reception area. Leaving the

     parent there, the interviewer took the

    child to the testing area while chatting

    in a friendly manner to establish rapportand relieve possible tension. The test

    itself was not discussed except in a very

    general way.In the test room, the child was seated

    at a table across from the interviewer

    (Fig. 1) and told that he or she would

    get some samples of orange drink and

    would be asked questions about them.

    The first sample was brought and the

    child invited to try it. When the child

    was finished, the interviewer began the

    questioning procedure according to the

    set protocol. After a rating was made,

    the child was told to drink some water

    while the interviewer got the next

    sample. The waiting period was about 2

    minutes. The second sample of the pair

    was then tr ied and rated. This was

    followed by the question, "Which did

    you like better, the first sample you tried

    or the second one?”Then the child was told there were

    more drinks to be tried and had a drink

    of wa te r whi l e wa i t ing ano the r 2

    minutes. The second pair was handledlike the first, and the child was escorted

     back to hi s o r h er pa rent . The whole

    sequence took about 10 minutes.

    AnalysesThere is a qualification to note here.

    Some findings, in the sense of the

    objectives of the research, rely on what

    may be called soft data; however, they

    were derived from hard data.  H a r d D a t a .  F o r t h e p a i r e d

    comparison, the significance of the

     proportions of choice was deter- mined

     by the z-test. For the scalar measures,

    the significance of the difference

     between the average rating for the high-

    swee t and low-swee t d r inks was

    determined using the t-by-difference

    test, which was natural, since each

    subject had tried both samples. Using

    the variances of the distributions was

    also considered, but the figures were

    volatile and hard to interpret. With

    scales of this kind, the variance is

    highly dependent on the average rating,

     being quite low when the upper end ofthe scale is approached, but increasing

    as the ave rage d rops toward the

    midpoint. Soft Data. The tables of results

    show significance levels ranging from

    1 % t o 1 5 % . T h e se f i g u r e s we r e

    compared among scales, between age

    groups, between test orders, between

    orders of serving, and so on.

    How legitimate, or how useful, is thisapproach? There is no routine, accepted

    statistical procedure for determining

    whether one level of significance is or

    is not significantly different from

    another. Perhaps a method for this

     purpose could be devised, but i ts

     possible ut i l izat ion has not been

    explored. An example of the questions

    to be resolved would be, how much

    more important is the 1 % level than the

    2% level? Probably not very important,

    since both are near certainty. But one is

    easily convinced that the 1% level

    shows more discrimination than the

    10% level. These are the kinds of

    decisions that served as the basis for

    most of the conclusions in this study.

    ResultsWhat, if anything, was discovered in

    th i s s tudy? Are any conc lus ions

    definitive, settling certain points once

    and for all? Not likely! But there are

    results that can direct future research on

    the subject.

     Paired-Comparison. The pairedcomparisons were always made after

    the pair of drinks had been presented

    and rated. The results, summarized

    across all tests, are shown in Table l.Overall, there was a highly significant

    difference – well below the 0.1% level

     – which was due in part to the large

    number of subjects (N). As expected,

    the high-sweet sample was preferred,

    which validated the product variable.

    Other conclusions come from compar-

    ing different subgroups.Test order, whether the first or second

     pair of the session, made no difference.There was no difference in discrimi-

    nation between boys and girls.Ch i ld ren 8 – 10 yea r s o ld were

    definitely more discriminating than the

    younger kids, who failed to establish a

    significant difference. Their failure

    might have been due to interference by

    the scaling task. The difference between

    ages might have been expected.Scale type may also have made a

    di f ference , a l though evidence i s borderline. When the comparison was

    82 FOOD TECHNOLOGY

  • 8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

    6/8

    Evaluating Rating Scales  (continued)

    made after the hedonic and P&K scales,

    discrimination was about the same as

    overall; but when it was made after the

    face scale, it dropped to the level of

    nonsignificance. This might be a chance

    effect, or there may be something about

    the face scale which later interfered with

    the paired comparison.

     Scale Length. Scale-length results(Table 2) tend to lay to rest the belief

    tha t chi ldren need s impl ic i ty and

    shouldn't be presented with too much

     because they will get con- fused. Within

    the context of these experiments, that

    did not prove to be the case. Quite the

    contrary – the 9-point scales were as

    good, if not better, than the 7-point

    versions. Definitely, the 7-point scales

    were not better. Whether the 9-point

    scales were actually better for discrimi-

    nation rests on comparison of the 5% vs1% levels of significance, but the 7-

     point scales offer no advantage.With the 9-point scales, all sub- groups

    showed significant discrimination,

    granted that at one point it dropped to a

    questionable 15% level; whereas with

    the 7-point scales, three subgroups

    showed nonsignificance.The boys did slightly better than the

    girls, although this was not consistent. It

    is probably trivial, and not indicative of

    any meaningful trend.

    This result is definite and hardlyunexpected. The children 8 – 10 years

    old showed good discrimination with

     both scale lengths, whereas the children

    5 – 7 years old showed s ignificant

    discrimination only with the 9-point

    scales, completely failing the task with

    the shorter version. On the basis of the

    supposition that the simpler scales

    should be easier for younger children,

    one might have expected this to be the

    other way around.

    It is often noted in sequential monadictesting that there is better discrimination

    when only the second-served samples

    are considered. In this study, there was

    significant discrimination with the

    second-served samples for both scale

    lengths, but almost none with the first-

    served samples. Is this due to some kind

    of contrast?Is it a training effect, where the ratings

    of the second sample have the benefit of

    experience with the first? This research

    could not address such questions in all

    of their complexity. Besides, such effects

    face scale, which typified the kind

    alleged to be better for children, failed

    to emerge as better than the otherscales.In a way, Table 3 i s repet i t i ve ,

    exhibiting effects shown in the other

    tables, but now separately for each

    scale type. However, it may add further

    emphasis to the following conclusions:

    The P&K scale gave better overall

    discrimination; older children showed

     better discrimination with all scales;

    and no scale discriminated when just

    the first-served samples were consid-

    ered, but the P&K and face scales did

    with the second-served samples.

    84 FOOD TECHNOLOGY  – Continued on page 86

     pe rt ain t o a ll te st ing, no t j us t when

    children are concerned.

     S c a le Ty p e .  T h e c r u x o f t h eresearch is the comparative evaluation

    of the three scale types. Overall, with

     N = 208 for each scale , al l scales

    significantly discriminated at better than

    the 10% level. However, the P&K scale

    (1% significance level) was better than

    the hedonic scale (8% significance

    level) and the face scale (7% signifi-

    cance l eve l ) . We th ink th i s i s an

    important finding, but remember the

    qualification about soft data – it is based

    on comparison of the 1% vs the 7% or

    8% level of significance. In addition, the

  • 8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

    7/8

    Evaluating Rating Scales  (continued)

    The second pair of drinks tested was

    consistently better for discrimination

    than the first pair, no matter the scale

    type. Does this mean that there is a

    learning effect, even from the brief first

    exposure to the task? If so, it is both bad

    news and good news. The bad news is

    that one does not have a pure measure.

    But who believes that is possibleanyway? The good news is that kids

    quickly learn to do a good job, and that

    the testing of multiple pairs is accept-

    able. Mode of Presentation. Table 4

    shows the results of the side study

    designed to help answer the question, Is

    there any advantage in using the two-

    stage, bifurcated approach? The study

    was limited to the 9-point scale.Overall, the bifurcated approach

    seems to offer no advantage over the

    straightforward. Even for the children 5

     – 7 years old – the age group for whom

    the method was de- s igned – the

     bifurcated scale was little better than

    the straightforward approach.The self-administration phase of the

    study was an embellishment done as an

    afterthought. It was limited in scope,

    utilizing only the hedonic and P&K

    scales, and excluding children 5 – 7

    years old for the obvious reason that

    they are preliterate.

    The results (Table 5) showed thatchildren 8 – 10 years old can handle

    written questionnaires effectively.

    Overall, the results were significant at

    the 1% level.Although not shown in the table, the

    effect of self-administration was more

     pronounced wi th the hedonic scale,

    whereas discrimination with the P&K

    scale was about the same with both

    approaches (one-on-one interviewing

    and self-administration). This finding

    should cheer sensory specialists. It

    makes things easier. If children of thisage are sufficiently knowledgeable that

     big words do not defeat the purpose,

    why bother with expensive one-on-one

    interviewing?

    Further Studies Needed

    The resul t s of th is s tudy can be

    summarized as follows: The P&K scale

     performs better than the hedonic or face

    scale. Reducing scale length from 9 po in t s t o 7 o f fe r s no advan tage .

    Children 5 – 7 years old do not perform

    any better with the face scale than with

    the other two scales. The bifurcated

    approach does not discriminate as well

    as the straightforward method. And

    older children perform as well using

    wr i t t e n q u e s t i o n n a i r e s a s wh e n

    interviewed one-on-one.The study, as noted earlier, was not

    intended to be the be all and end all.

    Rather, it was intended as a foundation

    for further studies. A re- view of

    variables will show that many need

    further at tention. While there are

     problems involved, there is a great deal

    to be obtained.

    ReferencesBirch, L.L. 1979. Dimensions of preschool

    children's food preferences. J. Nutr. Educ. 2(2):

    77.C o lwi l l , J . S . 1987 . S e nso r y a na ly s i s by

    consumers. Food Mfr., Feb., p. 53.Morse, R.L.D., 1953. Exploratory studies of

     preschool children's taste discr imination and

     preference for selected juices. Proc. of Florida

    State Horticultural Soc., Daytona Beach.Moskowitz, H.R. 1985. Product testing with

    children. In "New Direction for Product Testing

    and Sensory Analysis of Foods," p. 147. Food

    and Nutrition Press, Inc., Westport, Conn.Peryam, D.R. 1989. Personal communication.

    Peryam & Kroll Marketing and Sensory

    Research, Chicago.Wells, W.D. 1965. Communicating with children.

    J. Adv. Res., p. 2.

    Based on a paper presented at the Spring

    Meeting of ASTM, San Francisco, Calif, May 24,1990.

     – Edi ted by Neil H. Mermelstein, Senior Associate Editor

    Reprinted from Food Technology  44(11) 78-80, 82, 84, & 86

    1990 Institute of Food Technologists©

    86 FOOD TECHNOLOGY

  • 8/9/2019 evaluatingratingscalesforsensorytestingwithchildren-foodtechnology

    8/8

    Peryam & Kroll has set up inGreater Los Angeles with all thesensory facilities, marketingresources, years of expertise andqualified staff you thought you

    would never find anywhere butat their Metropolitan Chicagoheadquarters.

    In fact, the West Coast Divisionalready has a data base thatincludes many thousands ofpeople with special demo-Graphic characteristics - ethnicbackground, economicstandards, non-traditional

    The Greater Los Angeles andMetropolitan Chicago officesserve clients nationwide. They both deliver quality research workand sophisticated project reports

    quickly and economically. So,you can select a location on the basis of test demographics, proposed market or personalconvenience, and know you arestill getting the comprehensiveservices you need to bring products successfully fromconcept to regional distribution ornational rollout.

    To contact P&K’s GreaterLos Angeles office directly, callJackie Beckley at:

    Peryam & Kroll

    West Coast Division4175 East La PalmaAnaheim, California 92807tel: 714-572-6888fax: 714-572-6808

    MARKETING & SENSORY RESEARCH

    METROPOLITAN CHICAGO ANDGREATER LOS ANGELES1-800-74-PKLAB

    Peryam & Kroll goes West Coast!