wilhelm&schoebi 07

10
Assessing Mood in Daily Life Structural Validity, Sensitivity to Change, and Reliability of a Short-Scale to Measure Three Basic Dimensions of Mood Peter Wilhelm 1 and Dominik Schoebi 1,2 1 University of Fribourg, Switzerland, 2 University of California, Los Angeles, USA Abstract. The repeated measurement of moods in everyday life, as is common in ambulatory monitoring, requires parsimonious scales, which may challenge the reliability of the measures. The current paper evaluates the factor structure, the reliability, and the sensitivity to change of a six-item mood scale designed for momentary assessment in daily life. We analyzed data from 187 participants who reported their current mood four times per day during seven consecutive days using a multilevel approach. The results suggest that the proposed three factors Calmness, Valence, and Energetic arousal are appropriate to assess fluctuations within persons over time. However, calmness and valence are not distinguishable at the between-person level. Furthermore, the analyses showed that two-item scales provide measures that are reliable at the different levels and highly sensitive to change. Keywords: ambulatory assessment, ecological momentary assessment, electronic diary, mood, affect, multilevel confirmatory factor analysis Introduction The repeated measurement of moods and emotions with high frequency is common in ambulatory psychological and psychophysiological assessment. Measurement sched- ules range from one assessment per day taken for several weeks (e.g., Cranford, Shrout, Iida, Rafaeli, Yip, & Bolger, 2006) to high-frequency assessment within a 24 h period (e.g., Ebner-Priemer & Sawitzki, 2007; Myrtek, 2004). Be- cause of the high repetition rate in such studies, the duration of a single assessment should be kept short to minimize the burden on participants. The higher the participants’ burden caused by the frequency and duration of single assess- ments, the more likely their compliance and motivation to give valid responses will decline. Moreover, when partici- pants need to rate redundant items, additional effects like the exaggeration of subtle differences between items may occur, compromising the psychometric properties of a scale (Bolger, Davis, & Rafaeli,2003; Fahrenberg, Leonhart, & Foerster, 2002; Lucas & Baird, 2006). Consequently, some researchers have used single items to assess different facets of mood (e.g., Fahrenberg, Hütt- ner, & Leonhart, 2001; Myrtek, 2004). The use of single items, however, raises the problem that the reliability of the state specific component of the measure cannot be deter- mined and separated from measurement error. Therefore, a variety of multi-item mood scales have been used, ranging from long item lists (e.g., Buse & Pawlik, 1996; Kubiak & Jonas, 2007) to specifically designed or adapted short scales (e.g., Cranford et al., 2006). For these short scales, reliability coefficients and sometimes factor structures have been reported, which are usually based on the analy- ses of the between-person variance (e.g., individuals’ av- erages over time). Yet, the within-person variance has often been ignored (for exceptions see e.g., Buse & Pawlik, 1996, 2001; Cranford et al., 2006; Schimmack, 2003; Zelinski & Larsen, 2000; Zevon & Tellegen, 1982). The goal of this article is to evaluate the psychometric properties of a parsimonious six-item mood measure that was developed to assess three basic dimensions of mood in peoples’ daily lives. We do so using a multilevel modeling approach to investigate the variance and covariance be- tween items at the between-person and the within-person level simultaneously. What Are Moods? Moods are rather diffuse affective states that subtly affect our experience, cognitions, and behavior. They operate continuously and “provide the affective background, the emotional color to all that we do” (Davidson, 1994, p. 52). Moods can be consciously experienced as soon as they gain the focus of our attention, and are then characterized by the predominance of certain subjective feelings. Moods should be distinguished from emotions. Al- though the definition of emotions depends heavily on the- oretical frameworks (e.g., Ekman & Davidson, 1994; Lew- DOI 10.1027/1015-5759.23.4.258 European Journal of Psychological Assessment 2007; Vol. 23(4):258–267 © 2007 Hogrefe & Huber Publishers

Upload: adrian-mangahas

Post on 04-Oct-2015

217 views

Category:

Documents


0 download

DESCRIPTION

sdfdfdfdssssssssssssss

TRANSCRIPT

  • P. Wilhelm & D. Schoebi: Assessing Mood in Daily LifeEuropean Journalof Psychological Assessment 2007; Vol. 23(4):258267 2007 Hogrefe & Huber Publishers

    Assessing Mood in Daily LifeStructural Validity, Sensitivity to Change,

    and Reliability of a Short-Scale toMeasure Three Basic Dimensions of Mood

    Peter Wilhelm1 and Dominik Schoebi1,2

    1University of Fribourg, Switzerland, 2University of California, Los Angeles, USA

    Abstract. The repeated measurement of moods in everyday life, as is common in ambulatory monitoring, requires parsimonious scales,which may challenge the reliability of the measures. The current paper evaluates the factor structure, the reliability, and the sensitivityto change of a six-item mood scale designed for momentary assessment in daily life. We analyzed data from 187 participants who reportedtheir current mood four times per day during seven consecutive days using a multilevel approach. The results suggest that the proposedthree factors Calmness, Valence, and Energetic arousal are appropriate to assess fluctuations within persons over time. However, calmnessand valence are not distinguishable at the between-person level. Furthermore, the analyses showed that two-item scales provide measuresthat are reliable at the different levels and highly sensitive to change.

    Keywords: ambulatory assessment, ecological momentary assessment, electronic diary, mood, affect, multilevel confirmatory factoranalysis

    Introduction

    The repeated measurement of moods and emotions withhigh frequency is common in ambulatory psychologicaland psychophysiological assessment. Measurement sched-ules range from one assessment per day taken for severalweeks (e.g., Cranford, Shrout, Iida, Rafaeli, Yip, & Bolger,2006) to high-frequency assessment within a 24 h period(e.g., Ebner-Priemer & Sawitzki, 2007; Myrtek, 2004). Be-cause of the high repetition rate in such studies, the durationof a single assessment should be kept short to minimize theburden on participants. The higher the participants burdencaused by the frequency and duration of single assess-ments, the more likely their compliance and motivation togive valid responses will decline. Moreover, when partici-pants need to rate redundant items, additional effects likethe exaggeration of subtle differences between items mayoccur, compromising the psychometric properties of a scale(Bolger, Davis, & Rafaeli,2003; Fahrenberg, Leonhart, &Foerster, 2002; Lucas & Baird, 2006).

    Consequently, some researchers have used single itemsto assess different facets of mood (e.g., Fahrenberg, Htt-ner, & Leonhart, 2001; Myrtek, 2004). The use of singleitems, however, raises the problem that the reliability of thestate specific component of the measure cannot be deter-mined and separated from measurement error. Therefore, avariety of multi-item mood scales have been used, rangingfrom long item lists (e.g., Buse & Pawlik, 1996; Kubiak &Jonas, 2007) to specifically designed or adapted short

    scales (e.g., Cranford et al., 2006). For these short scales,reliability coefficients and sometimes factor structureshave been reported, which are usually based on the analy-ses of the between-person variance (e.g., individuals av-erages over time). Yet, the within-person variance has oftenbeen ignored (for exceptions see e.g., Buse & Pawlik, 1996,2001; Cranford et al., 2006; Schimmack, 2003; Zelinski &Larsen, 2000; Zevon & Tellegen, 1982).

    The goal of this article is to evaluate the psychometricproperties of a parsimonious six-item mood measure thatwas developed to assess three basic dimensions of mood inpeoples daily lives. We do so using a multilevel modelingapproach to investigate the variance and covariance be-tween items at the between-person and the within-personlevel simultaneously.

    What Are Moods?

    Moods are rather diffuse affective states that subtly affectour experience, cognitions, and behavior. They operatecontinuously and provide the affective background, theemotional color to all that we do (Davidson, 1994, p. 52).Moods can be consciously experienced as soon as they gainthe focus of our attention, and are then characterized by thepredominance of certain subjective feelings.

    Moods should be distinguished from emotions. Al-though the definition of emotions depends heavily on the-oretical frameworks (e.g., Ekman & Davidson, 1994; Lew-

    DOI 10.1027/1015-5759.23.4.258European Journal of Psychological Assessment 2007; Vol. 23(4):258267 2007 Hogrefe & Huber Publishers

  • is & Haviland-Jones, 2000), most researchers would agreethat emotions are short-term reactions to events or stimulithat manifest themselves in different subsystems of the or-ganism (expression and behavior, physiology, subjectiveexperience, and cognitions). In contrast to emotions, moodsare not necessarily linked to an obvious cause that can berelated to an event and its specific appraisal. They showlittle synchronization of the different subsystems, do notinterrupt ongoing behavior, and do not prepare immediateactions (Scherer, 2005). Usually the intensity of moods islow to medium and they may last over hours and days.

    How Can Moods Be Conceptualized andMeasured?

    During the last two decades competing two-dimensional ap-proaches have dominated the discussion about the structureof mood and affect. One model, proposed by Russell, as-sumes that the core affect of a feeling is a single integralblend of the independent dimensions valence and arousal(Russell, 2003, p. 148). Russell, Weiss, and Mendelsohn(1989) introduced an affect grid to assess valence andarousal simultaneously via two items. Its brevity makes theaffect grid very attractive for ambulatory assessment research(Reicherts, Salamin, Maggiori, & Pauls, 2007). However, be-cause each dimension is assessed with one item only, mea-surement error cannot be determined for a single occasion. Inaddition, Schimmack and Grob (2000) criticized that the la-bels of the activation dimension are not close to commonlanguage and experience. In contrast to Russell, Thayer(1989) argued that two basic arousal dimensions need to bedistinguished to describe a mood state, namely tense arousal(relaxationtension), and energetic arousal (tirednesswake-fulness). In Thayers view, valence is not a separate dimen-sion, but a mix of his basic arousal dimensions.

    Watson and Tellegen (1985) proposed that affects can bedescribed by two uncorrelated basic dimensions, which arecalled positive affect (PA) and negative affect (NA). Theydeveloped the Positive and Negative Affect Schedule(PANAS) to measure each dimension with 10 unipolar items.According to Watson and Vaidya (2003, p. 356) the PANAShas gained much popularity because of the rich body ofpsychometric data that have established the reliability andvalidity of the scales. However, the validity of the theoreticalconception of the PA and NA dimensions, the factorial solu-tion on which they are based, and the difficulty in interpretingthe scores and relating them to commonly experienced feel-ings were criticized (e.g., Fahrenberg, 2006; Russell & Car-roll, 1999a; Schimmack, 1999). Moreover, some critics re-jected the basic assumption that affect can be sufficientlydescribed by two orthogonal dimensions (Matthews, Jones,& Chamberlain, 1990; Schimmack & Grob, 2000). They ad-vocated a model in which valence (V; ranging from unpleas-ant to pleasant), calmness (C; ranging from restless/undertension to calm/relaxed), and energetic arousal (E; rangingfrom tired/without energy to awake/full of energy) form thebasic dimensions. Although these dimensions are substantial-ly correlated (cf. Table 1), they cannot be reduced to a two-dimensional model. In addition, different experimental ma-nipulations, such as taking sedative drugs or sleep depriva-tion, caused different patterns of changes in the three mooddimensions, which would not have been captured by the two-dimensional approaches discussed above (Matthews et al.,1990). Different instruments exist to measure the three mooddimensions: The UWIST Mood Adjective Checklist (Mat-thews et al., 1990) assesses each dimension with eight unipo-lar items; the German-language Multidimensional MoodQuestionnaire (MDMQ) provides short-scales consisting offour unipolar items (Steyer, Schwenkmezger, Notz, & Eid,1997). Schimmack and Grob (2000) used six unipolar items,which they combined into three bipolar items per dimension.

    Table 1. Correlations between the three basic dimensions Valence (V), Energetic arousal (E), and Calmness (C) in differentstudies

    r(ValenceEnergetic arousal) r(ValenceCalmness) r(Energetic arousalCalmness)

    Schimmack & Grob (2000, p. 335, 337)a

    Study 1: 207 American students.49 .70# .33#

    Study 2: 135 American students, two times .47 .57# .20#

    Schimmack & Reisenzein (2002, p. 415)c,a

    710 American and Canadian students.46 .65# .28#

    Steyer et al. (1997, p. 14)b

    503 German participants; 47% students, four times.50 to .62 .66 to .72 .43 to .53

    Matthews et al. (1990, p. 25)c

    388 British participants, mostly students.43 .37# .04#

    Notes:aThe original dimensions were labeled as follows: pleasure displeasure V, awake tiredness E, tension relaxation C:bThe original dimensions were labeled as follows: good bad mood V, wakefulness tiredness E, calmnessuncalmness C:cThe original dimensions were labeled as follows: valence / hedonic tone V, energetic arousal E, tense arousal CCorrelations were between latent factors and, therefore, adjusted for measurement error#Calmness was coded the other way around, such that high values indicated high tension. To ensure comparability with our coding system, thesigns of the original correlations were reversed.

    P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life 259

    2007 Hogrefe & Huber Publishers European Journal of Psychological Assessment 2007; Vol. 23(4):258267

  • In addition to measures based on dimensional models ofmood and affect, various instruments have been developedto assess qualitatively distinctive mood states (e.g., the re-vised Multiple Affect Adjective Check List (MAACL-R),the PANAS-X or the Profile of Mood States (POMS); thelatter assesses, e.g., tension-anxiety, depression-dejection,anger-hostility, and others). The general problem withthese approaches is that neither the nature nor the numberof distinguishable mood states is clear. Moreover, the pro-posed specific mood-states are usually highly correlated(Schimmack, 1999; Watson & Vaidya, 2003).

    Methods to Evaluate the PsychometricProperties of Scales Used in AmbulatoryAssessment Studies

    Earlier approaches to demonstrate the factor structure ofrepeated measurement data followed Cattells suggestionto factorize the between-person correlations, which are re-peated time by time (R-technique) separately from thewithin-person correlations, which are repeated person byperson (P-technique; e.g., Zevon & Tellegen, 1982). Con-temporary approaches use structural equation models(SEM) or multilevel models (MLM) to estimate the facto-rial structure between and within persons simultaneously(see data analysis).

    Specific reliability coefficients for ambulatory assess-ment measures have been calculated in various ways. Al-though the computational details differ, all of these meth-ods decompose the total variance into trait, state, and errorcomponents. To obtain indicators for the within-person re-liability, Buse and Pawlik (2001) correlated test halvesacross occasions for each participant (Cattells P-matrix)and averaged those coefficients across participants. To ob-tain indicators of the aggregate reliability, which is basedon the between-person variance, the odd-even method wasapplied (e.g., Buse & Pawlik, 1996; Perrez, Schoebi, &Wilhelm, 2000).

    Cranford et al. (2006) decomposed the variance of theirmeasures into variance between persons, days, items of thesame scale, the two way interactions, and residuals. Usinggeneralizability theory they combined the variance compo-nents to demonstrate high aggregate reliability and satis-factory within-person reliability for their three-item moodscales. A similar but less formalized approach was pro-posed by Fahrenberg et al. (2002).

    Other approaches to obtain specific reliability estimatesare based on structural equation modelling (SEM). One im-portant class of models in this framework are latent-statelatent-trait (LST) models. In LST theory (e.g., Steyer,Schmitt, & Eid, 1999) the total variance of a variable at agiven occasion is partitioned into three components: a la-tent-trait component, which does not change over occa-sions and indicates true consistency, a latent-state residual,which captures the occasion-specific deviation from the

    trait and indicates true variability between occasions, andmeasurement error. In LST theory reliability is defined asthe ratio of true variance (latent-trait variance + latent-stateresidual variance) to total variance at a given occasion. An-other variance decomposition, which takes the serial de-pendency of repeated measures into account, was proposedby Kenny and Zautra (2001). In their model, the total vari-ance is divided into a stable trait, an autoregressive trait,and a state component, which contains situational influenc-es and error, and is supposed to vary randomly over time.

    The Current Study

    The purpose of this study was to evaluate the psychometricproperties of a short mood measure designed to assess threebasic mood dimensions in peoples daily lives. Data werecollected from a sample of 187 participants who reportedtheir mood state four times a day over the course of a weekby means of the current mood measure. Using a multilevelapproach, the three-factorial structure that was proposed byMatthews et al. (1990), Schimmack and Reisenzein (2002),and Steyer et al. (1997) was simultaneously tested betweenpersons and within persons. We further showed how errorvariance can be separated from latent variance at the dif-ferent levels to obtain level-specific reliability coefficientsand evaluate each scales sensitivity for measuring truechange over time.

    Method

    Participants

    Ninety-eight Swiss couples were recruited to participate ina 1 week diary study either in undergraduate psychologyclasses or through private acquaintances of graduate stu-dents. Because of technical failures of the handheld com-puters, data of nine persons were lost. Thus, data of 93women and 94 men from 97 heterosexual couples could beanalyzed. Age of participants ranged between 19 and 36years (M = 25.6, SD = 3.2); half of them were students.

    Electronic Diary Procedure

    Four times a day over the course of a week, participants wereasked to rate their current mood and a series of other questionsnot relevant to this paper. The diary questions were imple-mented on Palm Tungsten T and T5, programmed with a pilotversion of IzyBuilder (http://www.izybuilder.com). Thequestions could be answered by using a stylus on a touch-screen. Around 11 a.m., 2:30 p.m., 6 p.m., and 9:30 p.m. thecomputer gave an acoustic signal. Signal time points wererandomized in a time window of 20 min around the intend-

    260 P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

    European Journal of Psychological Assessment 2007; Vol. 23(4):258267 2007 Hogrefe & Huber Publishers

  • ed times to prevent participants from anticipating the exactbeginnings of the report.

    Measures

    Mood

    To measure the basic mood-dimensions V, C, and E in peo-ples daily life, we developed a six-item short scale thatrelied on the Multidimensional Mood Questionnaire(MDMQ), a German-language mood scale (Steyer et al.,1997). The MDMQ provides consistent four-item scales tomeasure each dimension (Cronbachs s of the three scalesranged from .73 to .89 over four repeated measures). Dur-ing each observation participants responded to the state-ment At this moment I feel: by means of six bipolaritems, which were presented in the following order on onedisplay: tiredawake [mdewach] (E+), contentdiscon-tent [zufriedenunzufrieden] (V), agitatedcalm [unru-higruhig] (C+), full of energywithout energy [energiege-ladenenergielos] (E)1, unwellwell [unwohlwohl](V+), relaxedtense [entspanntangespannt] (C). Thescales had seven steps. Their endpoints 0 and 6 were asso-ciated with the label very. Answers were given by mov-ing a slider from the start position 0, at the left end of ascale, to the position which corresponded best to the currentstate. To make sure that participants responded by movingthe slider rather than browsing through the allocation, atleast one of the two items belonging to a dimension had tobe moved to proceed to the next question. Prior to the anal-yses, data from three items were reverse coded, to ensurethat higher scores indicate higher positive V, higher E, orhigher C.

    Data Analysis

    We used multilevel analyses (e.g., Raudenbush & Bryk,2002; Goldstein, 2003) to investigate the variance and co-variance of the mood items. With MLMs, confirmatory fac-tor analyses (CFA) and regression models can be computedsimultaneously for the within- and the between-person partof the data. Compared with SEMs, they are better suited toanalyze hierarchically structured, unbalanced data setswith missing observations, such as are typically obtainedin ambulatory assessment. A shortcoming of MLMs is thatunlike SEM, they do not provide established fit indices.Recently Bauer (2003) and Curran (2003) have demon-strated that nested structures of unbalanced data can alsobe modeled with SEMs. However, the treatment of such

    data is computationally easier with MLMs. We, therefore,used an MLM approach and the program MLwiN 2.02(Rasbash, Steele, Browne, & Prosser, 2005) to analyze thedata. MLwiN provides an iterative generalized least squarealgorithm to obtain parameter-estimates. At convergence,these estimates are maximum likelihood. The procedureyields a deviance-statistic (2 log likelihood) that indicateshow well the specified model fits the data. If two modelsare nested, the difference of their deviances has a distri-bution, with degrees of freedom equal to the difference inthe number of parameters estimated in the models. Thisstatistic can be used to test whether two models significant-ly differ in their fit.

    Because of the large number of cases in our data set thepower was high to reject a more constrained model al-though its fit was not substantially worse. Therefore, the level to evaluate the fit-difference of two models was setto p = .001.

    Results

    The raw data consisted of 4,577 observations provided by187 persons. Because of technical problems, the percentageof missing observations during the first 7 consecutive dayswas high (on average 20.4%, SD = 31.7). However, manyparticipants compensated for these technical failures by ex-tending the observation period, resulting in a satisfying av-erage number of 24.5 observations per participant (SD =5.9; range 6 to 44). Ten observations were excluded be-cause they contained contradictory extreme responses, andtherefore, a total of 4,567 observations were analyzed.

    Item Variances and Covariances Betweenand Within Persons

    In a first step, the item covariation at the within- and thebetween-person level was explored. A model with threelevels was set up, in which mood-items (Level 1) were nest-ed within observations (Level 2), which were nested withinpersons (Level 3).2 In the basic model, each of the six mooditems was identified by a dummy-coded indicator variablefor which a fixed effect and random effects at Level 2 andLevel 3 were estimated, according to the following equa-tion:

    yijk = 1 (item1) + 1k (item1) + u1jk (item1) + . . . +6(item 6) + 6k (item 6) + u6jk (item 6) (1)

    P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life 261

    2007 Hogrefe & Huber Publishers European Journal of Psychological Assessment 2007; Vol. 23(4):258267

    This item is not part of the MDMQ. It was included because of positive characteristics in previous diary studies of our research group (Perrezet al., 2000, Wilhelm, 2004).

    To keep the models as simple as possible, we do not take into account that feeling states reported by romantic partners are positivelycorrelated. The consequence of not modeling the similarity between partners is that significance tests are too liberal at the between-personlevel. However, this bias is marginal when the number of couples is rather large as in our study (see Kenny, Kashy, & Cook, 2006) and,therefore, does not compromise our conclusions.

  • Thus, the response yijk given on a particular item (subscripti) at a particular time (subscript j) by a particular person(subscript k) was modeled as a function of each items over-all mean i, from which deviation was allowed. The esti-mate for ik captures the extent to which a persons averageresponse k on item i deviates from the overall mean of thisitem (variation between persons). The estimate for the ran-dom effect at Level 2, uijk, captures the extent to whichresponses given at different times j deviate from each per-sons average response k on a particular item i. Thus, thisestimate captures variation within persons, reflecting dif-ferences between observations over time. The random co-efficients of the six items were allowed to covary at eachlevel.

    The fixed coefficients of Model 1 were 4.56 for content,4.53 for well, 4.41 for calm, 4.30 for relaxed, 3.42 forawake, and 3.51 for full of energy, indicating that on aver-age, participants were in a positive and relaxed state andabove a medium energy level. Results of the random partof Model 1 are shown in Table 2. As can be seen from thediagonals, the variances between observations (Level 2)are approximately 3 to 4 times larger than the variancesbetween persons (Level 3). This indicates that the biggestpart of the total variation in each item is the result of dif-ferences between observations and error. Below the diago-nals, the correlation coefficients between items are shown.At Level 2, the pattern of correlations indicates that theitems that belong to a common factor show the highest as-

    sociations. However, the contrast between items that be-long to the same factor and items that belong to differentfactors was substantial only for the items full of energy andawake (which form the factor E). For the other items thisdifference was small. At Level 3, correlations were higherthan they were at Level 2, but the pattern was quite similar.

    Factor Structure Between and WithinPersons

    In the next step, a model was specified in which the vari-ances and covariances of the three postulated factors wereestimated at Level 2 and Level 3 (Model 2). Each factorwas identified by a dummy variable.3 At Level 2 and Level3, the variances and covariances of these factor variableswere estimated. In addition, each single item was allowedto vary, but the covariances between items and the covari-ances between factors and items were constrained to bezero. As before, fixed effects were estimated for each item.4

    Compared with the saturated Model 1, Model 2 fit thedata significantly worse, (18) = 179.4, p < .001. We,therefore, tested a modified model in which item residualswere allowed to be correlated. In order to keep this modelsimple, correlations between residual item variances wereonly allowed when their corresponding Wald-test was p