sample size information.supp.apa.org/.../xge0000155/zfr002162741so1.docx · web view= 6.38, 1.55),...

Supplemental Materials

Questioning the End Effect: Endings Are Not Inherently Over-Weighted in Retrospective

Evaluations of Experiences

by S. Tully & T. Meyvis, 2016, JEP: General

http://dx.doi.org/10.1037/xge0000155

Web AppendixContents

Sample size information..................................................................................................................2

Pretest for positive sound stimuli (Study 3)....................................................................................4

Supplemental Studies.......................................................................................................................5

Supplemental Study 1: A Better Average versus a Better End....................................................5

Supplemental Study 2: Single Versus Repeated Negative Experiences....................................12

Supplemental Study 3: Single versus Repeated Positive Experiences......................................16

Sample size information.

Study 1. An examination of past research using similar stimuli (annoying sounds that

increased or decreased in intensity; i.e., Ariely and Zauberman, 2000; Schreiber and Kahneman,

2000) found that samples typically ranged from 20-54 participants. A calculation of effect sizes

where possible demonstrated that changes in intensity resulted in effects with ƞp2 between .46

and .80. Although this would be considered a large effect size, as a conservative test, we selected

a sample of 300 participants (150 people per cell) to ensure 99% power to detect a medium effect

size (not including additional power achieved through the use of the covariate). We posted the

study online and received 303 responses.

Study 2. Participants were undergraduate students who participated for course credit. The

study was run over the course of two full semesters, utilizing all participants available to the

researchers during that time, resulting in 349 student participants.

Study 3. The pre-test of the song clips (as well as the researchers’ prior experience in

other projects) indicated that there is substantially greater variability in the enjoyment of music

than in irritation with noise. We therefore opted to double the sample size that we had chosen in

Study 1 for a total of 900 participants (300 per cell). We posted the study online and received

912 responses.

Study 4. In this study, we aimed to test the interaction of sound number (within-subjects)

and sound order (between-subjects). Assuming a moderate correlation between ratings of the two

stimuli in the repeated measures design (r = .4) and adjusting for the additional power expected

through the use of the covariate (as seen in previous studies), 200 participants provided 90%

power to detect a small effect size interaction. We posted the study online and received 204

responses.


study was run for a full semester, utilizing all participants available to the researchers, resulting

in 303 student participants.

Study 6. This study was run in collaboration with the obstacle course racing company.

The company emailed all race participants (approximately 7,000) with a request to complete the

survey (without compensation). The response rate was slightly over 10%, yielding a total of 750

participants.


study was run for a full semester, utilizing all participants available to the researchers, resulting

in 238 student participants.

Pretest for positive sound stimuli (Study 3).

Study 3 used three different music clips: one music clip consisting of four enjoyable

pieces of instrumental music (used in the control condition) and two music clips consisting of

those same four pieces and one additional, less enjoyable piece of instrumental music (inserted

either in the middle or at the end). To select these music fragments, we first pretested a wide

range of instrumental music fragments using a sample of 121 participants drawn from the same

population as used for the main study (Mechanical Turk). Each participant listened to a selection

of ten 30-second clips of instrumental music (out of a total set of 30 clips) and rated each clip on

a 9-point scale. Based on this pretest, we selected four clips that were enjoyed by most

participants, namely 30-second fragments from “Herd Reunion” (from the Ice Age: Continental

Drift Soundtrack, M = 6.84, SD = 1.91), “Heart Song” (performed by Gosha Mataradze, M =

6.29, SD = 2.09), Bach’s “Goldberg Variations” (M = 6.38, 1.55), and Mozart’s “Rondo Alla

Turca” (M = 6.05, SD = 2.03). We also selected one sound clip that was significantly less

enjoyable than each of the four other clips: “Reanimator” (performed by Amon Tobin, M = 4.71,

SD = 2.12), all t’s(79) > 2.92, p’s < .002. To further ensure that this last clip was less enjoyable

than the others, we increased its repetitiveness by expanding it to 45 seconds and also applied a

minor change in pitch.

Supplemental Studies.

Supplemental Study 1: A Better Average versus a Better End

(Conceptual Replication of Study 2).

This study is a conceptual replication of study 2 using a different set of aversive sound

profiles. As in study 2, the goal of this study was to examine whether the positive effect of

extending an aversive experience with a less aversive (but still negative) ending occurs because it

improves the end of the experience or because it changes the range or average intensity of the

experience. In this study, we exposed participants to one of three sound clips of an irritating

noise: (1) a clip with a less intense (and thus better) middle section (Better Middle), (2) a clip

with a less intense ending (Better End), or (3) a clip with a less intense middle section and an

additional less intense ending (Added End). The Better Middle and Better End clips had

approximately the same average intensity, but differed in the timing of the softer section. The

Added End clip consisted of the Better Middle clip with an additional, less intense extension of

the noise.

Thus, the Better Middle and Better End clips differed in the aversiveness of the ending,

but not in the average intensity of the experience, whereas the Added End clip differed from both

other clips in the average intensity of the experience. If endings are over-weighted in evaluations,

then the Better End and Added End experiences should both be perceived as less aversive than

the Better Middle experience. However, if adding a less aversive ending improves evaluations

because it reduces the average intensity (or range of intensity), then the Added End experience

should be perceived as less aversive than both the Better Middle and Better End experiences, and

there should be no difference in perceived aversiveness between the Better Middle and Better

End conditions.

Method

Two hundred and sixty undergraduate students participated in the study for either partial

course credit or monetary compensation.

Participants were seated at a desktop computer and asked to wear headphones, the

volume of which was fixed and approximately equal across computers. All participants first

listened to a short drill sound, and rated their irritation with the sound on a 101-point sliding

scale (0 = not at all irritating, 100 = very irritating). As in Study 1, this measure was included to

be used as a covariate in the analyses and thus reduce error variance due to individual differences

in aversion to annoying sounds. Next, participants completed a short filler task before continuing

with the main study.

Participants were then asked to listen to the sound of a vacuum cleaner. They listened to

one of three sound profiles, depending on condition. All three sound profiles consisted of a

vacuum noise that fluctuated in intensity. The first 50 seconds of all of the clips were identical

and oscillated in intensity (relatively high to relatively low to relatively high). Then the intensity

differed by condition. In the Better End condition, it remained at a relatively high intensity until

it tapered off to a lower intensity where it remained for the last 30 seconds. In the Better Middle

condition, the 30-second low-intensity segment followed the initial oscillation (and tapering

period) before increasing to a higher intensity for the remainder of the clip. In the Added End

condition, the Better Middle clip was extended by an additional 30-second low-intensity segment

which followed a short tapering period. Thus, the sound clips in the Better Middle and Better

End conditions differed in ending, but not in approximate average intensity1, whereas the sound

1 There was a slight difference in average intensity, which we address in the discussion.

clip in the Added End condition differed in average intensity from the clips in both other

conditions. See Supplemental Figure 1 for a visual depiction of the sound profiles.

Supplemental Figure 1. Visual depiction of sound profiles used in supplemental study 1. The height of the waveform represents the momentary intensity of the sound as a percentage of the highest intensity in the sound clip. Time is represented on the horizontal axis in seconds.

Better End:

Better Middle:

Added End:

After participants listened to the clip, they rated the extent to which they found the

experience of listening to the sound annoying (9-point scale: 1 = mildly annoying, 9 = extremely

annoying), unpleasant (9-point scale: 1 = mildly unpleasant, 9 = very unpleasant), or irritating

(measured on the same scale as the covariate: a 101-point slider scale anchored by: 0 = mildly

irritating, 100 = extremely irritating).

After the primary dependent measures were collected, participants were asked to again

listen to the drill sound that they listened to at the start of the study, and then indicated whether

this experience was more or less irritating than listening to the vacuum sound (9-point scale: 1 =

much less irritating, 9 = much more irritating). Participants then rated the volume of the vacuum

sound (9-point scale: 1 = very quiet, 9 = very loud). Next, participants indicated how much

money, out of $10, they would give back to avoid repeating the experience, and how long (in

seconds) they believed the experience lasted. These four additional measures were included to

test whether, if the end effect would again not obtain on scale measures of the subjective

experience, it might instead manifest on alternative measures: a relative preference measure

(which avoids scaling effects), an evaluation of the objective experience (volume), valuation, or

a downstream effect (on time perception).

To verify that participants had noted the volume at the end of the clip, they were asked to

indicate how the end of the experience compared to the rest of the experience (by selecting one

of three options: the end was quieter, the end was about the same, the end was louder).

Finally, participants provided demographic information and completed an Instructional

Manipulation Check (Oppenheimer, Meyvis, & Davidenko, 2009), which consisted of a

paragraph of text explaining the importance of reading instructions and asking participants to

choose “none of the above” from a marital status dropdown list.

Results

Thirty-five people failed the Instructional Manipulation Check, leaving a sample of 224

participants (MAge = 20.2, SD = 2.17; 44.2% male).

Manipulation check. Participants were more likely to indicate that the end was quieter

than the rest of the sound clip in the Better End condition (P = 60.1%) than in the Better Middle

condition (P = 31.5%), χ2 (1) = 12.82, p < .001, Cohen’s d = .61, indicating that the manipulation

of the ending was successful. Participants in the Added End condition were also more likely to

indicate that the end was quieter (P = 45.3%) than were participants in the Better Middle

condition, but this effect was only marginally significant, χ2 (1) = 2.95, p = .086, Cohen’s d = .29.

It is possible that because this clip was longer, the perception of the end extended beyond the

final low-intensity segment, thus reducing the perceived difference. In addition, it is also possible

that, because the Added End had a lower average intensity, the intensity of the end of this clip

was not as different from the average as it was in the Better End condition.

Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were

standardized and combined to form an aversiveness index (α = .93). As in the studies in the main

paper, we analyzed this index while controlling for the aversiveness covariate (the rating of the

drill sound at the start of the study) to increase the power of the tests and to control for any

variation in volume across computers. The covariate was a significant predictor of aversiveness

ratings, F(1, 220) = 75.38, p < .001, ηp2 = .255. The overall effect of condition did not reach

significance, F(1, 220) = 2.25, p = .108, ηp2 = .020, but planned contrasts support our hypotheses.

First, we tested the end effect by comparing the Better Middle and Better End conditions, which

differed in ending, but not in average intensity. This planned contrast showed that the experience

was not perceived as less aversive in the Better Middle condition (M = 0.10, SD = 0.81) than in

the Better End condition (M = 0.06, SD = 0.82), F < 1, 95% CI [-0.31, 0.22], ηp2 < .001.2 Thus,

we did not find evidence of an end effect. Next, we tested whether adding a better end (rather

than moving the better part to the end) changes the perceived aversiveness of the experience, by

comparing the Added End condition to the other two conditions, both of which had a higher

average intensity. A planned contrast confirmed that the experience was perceived as less

aversive in the Added End condition (M = -0.16, SD = 0.81) than in the other two conditions,

F(1, 220) = 4.43, p = .036, 95% CI [-0.94, -0.03], ηp2 = .020. Thus, while we again did not

replicate the end effect, we did replicate the prior finding that extending a negative experience

2 Note that the means presented in the results are the adjusted means (adjusted for the covariate). The unadjusted means are: M (Better End) = 0.11, M (Better Middle) = 0.00, M (Added End) = -0.10.

with a less aversive ending reduces the overall aversiveness of the experience (in spite of adding

negative utility).

Supplemental Figure 2. Perceived Aversiveness Ratings by Condition (Supplemental Study 1)

Added End Better Middle Better End-0.3

-0.2

-0.1

5.55111512312578E-17

0.1

0.2

0.3

Aver

siven

ess I

ndex

Note: Error bars denote standard errors.

Other measures. The relative irritation measure (asking participants to rate their

irritation from the vacuum noise relative to the drill noise) and the perceived volume question

were not affected by the manipulations: neither the overall effect of condition, nor the planned

contrasts were reliable (all F’s < 1.88, NS). However, the pattern of results for the question

asking participants their willingness to pay to avoid repeating the experience replicated that for

the aversiveness index. There was a marginally significant effect of condition, F(2, 220) = 2.63,

p = .075, ηp2 = .012. There was no difference in willingness to pay between the Better End

condition (M = $1.56, SD = $2.52) and the Better Middle condition (M = $1.60, SD = $3.37), F

< 1, NS, 95% CI [-0.85, 0.86], ηp2 < .001. However, willingness to pay was significantly lower in

the Added End condition (M = $0.78, SD = $1.92) than in the other two conditions, F(1, 220) =

5.25, p = .023, 95% CI [-3.20, -0.24], ηp2 = .023.

Finally, there was an overall effect of condition on estimates of duration, F(2, 220) =

9.94, p < .001, ηp2 = .083. Participants in the Added End condition provided higher estimates of

clip duration (M = 155 secs, SD = 75.21) than those in the Better End condition (M = 117 secs,

SD = 75.24) or the Better Middle condition (M = 101.09, SD = 75.00), F(1, 220) = 18.56, p

< .001, 95% CI [49.87, 133.98], ηp2 = .078, which was consistent with the actual longer duration

of the clip in that condition. Estimated duration did not differ between the latter two conditions,

F(1, 220) = 1.63, NS, 95% CI [-8.58, 40.03], ηp2 = .007.

Discussion

Moving the less aversive part of an irritating noise to the end versus the middle did not

affect the perceived aversiveness of the experience, casting further doubt on the notion that

endings are inherently over-weighted in evaluations of experiences. However, extending the

irritating noise with an additional, less aversive part did lead participants to perceive the overall

experience as less aversive. Thus, this study replicates prior findings of the beneficial effects of

“adding a better end,” but also indicates that this effect is not driven by a disproportionate impact

of the end, but rather by another processes, such as lowering the average intensity of the

experience. In addition, this study also argues against a scaling effect interpretation of the

findings (e.g., the end changes the meaning of the rating scale) since the findings from the

primary dependent measure were replicated with a monetary value measure.

A potential limitation of this study is that the average intensity was in fact slightly lower

in the Better Middle condition than in the Better End condition. This slight difference was due to

a gradual transition to the final volume in the Better Middle condition—which was used to avoid

a jarring sound increase. While it is possible that the difference in the average intensity reduced

the potential to find an end effect, this difference was only minimal (compared to the difference

in average intensity with the Added End condition) and the findings were conceptually replicated

in study 2, as well as (in the positive domain) in study 3.

Supplemental Study 2: Single Versus Repeated Negative Experiences

(Conceptual Replication of Study 4)

This study is a conceptual replication of study 4 using a different set of aversive sounds

and conducted in the lab rather than online. Thus, this study also examined the impact of singular

versus repeated experiences. As in study 4, each participant was sequentially presented with two

clips of aversive sound, one that started well, but ended poorly (Worse End) and one that started

poorly, but ended well (Better End). We expected that the position of the less aversive segment

would not affect participants’ rating of the first sound clip, but would affect the rating of the

second sound clip. That is, after listening to a sound clip with a worse (better) end, participants

will rate a clip with a better (worse) end as less (more) aversive.

Method

One hundred and sixty-four undergraduate students participated in the study in exchange

for partial course credit.

The procedure was similar to that of study 4. However, instead of asking participants to

calibrate the volume settings after listening to a sample sound, the volume settings were fixed

and approximately equal across computers. As in the other studies, participants listened to and

rated their irritation with a drill sound clip (to be used as covariate). Participants then listened to

one of two versions of the main stimulus (100 seconds of vacuum cleaner noise). Each clip

consisted of 70 seconds at a relatively higher intensity and 30 seconds at a relatively lower

intensity. The two sound clips were identical but reversed such that the lower intensity segment

was positioned at the end of one clip (Better End) and at the beginning of the other clip (Worse

End). See Supplemental Figure 3 for a visual depiction of the sound stimuli. Note that these

sound clips are identical to the Added End and Added Beginning sound clips used in Study 2 of

the paper. Immediately after listening to one of the sound clips, participants rated how annoying,

unpleasant, and irritating it was to listen to the clip (on 9-point scales: 1 = not at all, 9 = very).

Then, as in Study 4, they listened to the other clip and rated this second clip on the same three

dependent measures. Unlike the studies in the main paper, this study did not have any additional

measures or any manipulation checks. Finally, participants provided demographic information

and completed an Instructional Manipulation Check, which consisted of a paragraph of text

explaining the importance of reading instructions and asking participants to ignore the question

underneath (a question about their geographical location with a list of regions). Participants were

asked to write “none” instead of selecting a region.

Supplemental Figure 3. Visual Depiction of Sound Profiles Used in Supplemental Study 2.

Better End:

Worse End:

Note: The height of the waveform represents the momentary intensity of the sound as a percentage of the highest intensity in the sound clip. Time is represented on the horizontal axis in seconds.

Results

Fifty-five participants failed the Instructional Manipulation Check. Due to this

unforeseen high failure rate (34% of all participants), we concluded that this was an overly

selective instructional check. Rather than removing 34% of all participants from the analysis, we

decided to not remove any participants from the analysis. Removal of these participants does not

change the pattern of results. The full sample consisted of 164 participants (MAge = 20.5, SD =

3.0; 46% male).

Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were

combined to form an aversiveness index for each sound clip (α clip 1 = .95, α clip 2 = .96). We

conducted a repeated-measures ANOVA with sound number (first sound, second sound) as a

within-subjects factor and sound order (Worse End first, Better End first) as a between-subjects

factor, adjusting for the covariate. The covariate was a significant predictor of aversiveness

ratings, F(1, 161) = 55.50, p < .001, ηp2 = .256. There was no main effect of sound order, F(1,

161) = 1.15, NS, ηp2 = .007, or of sound number, F < 1, ηp

2 = 0.001. However, as predicted there

was a significant interaction of these two factors, F(1, 161) = 20.41, p < .001, ηp2 = .112 (see

Supplemental Figure 4). Specifically, participants perceived the first sound as more aversive than

the second sound when they listened to the Worse End first, F(1, 161) = 8.66, p = .004, 95% CI

[0.10, 0.49], ηp2 = .051), but perceived the first sound as less aversive than the second sound

when they listened to the Better End first, F(1, 161) = 12.04, p = .001, 95% CI [-0.55, -0.15], ηp2

= .070. Thus, consistent with prior research, participants perceived their Better End experience as

less aversive than their Worse End experience.

However, as in Study 4, the between-subjects comparisons for the first and second sound

add an important nuance to the interpretation of these results. Once again, ratings of the first

sound were unaffected by the position of the better segment. Participants who listened to the

Better End clip first (M = 6.82, SD = 1.85) did not perceive the sound as less aversive than those

who listened to the Worse End clip first (M = 6.84, SD = 1.85), F < 1, 95% CI [-0.56, 0.59], ηp2 <

.001. It was only for the second sound that the position of the better segment mattered.

Participants who listened to the Better End clip second rated that sound as less aversive (M =

6.54, SD = 1.87) than did those who listened to the Worse End clip second (M = 7.17, SD =

1.87), F(1, 161) = 4.50, p = .035, 95% CI [-1.21, -0.04], ηp2 = .027. These results provide further

evidence that the end effect does not obtain in evaluations of a single experience.

Supplemental Figure 4. Perceived Aversiveness of Sound Clips by Sound Number

(Supplemental Study 2)

First Sound Clip Second Sound Clip4

5

6

7

8

9

Worse End Better End

Aver

siven

ess (

9-po

int s

cale

)


Discussion

This study conceptually replicated the results of Study 4 with a different set of sound

profiles. Further, this study was run in a laboratory setting, which implies that the volume

settings were more closely regulated and the experiment was overseen by a research assistant.

Although within-subjects comparisons show that participants perceived the same sound stimulus

as less aversive when the better segment was positioned at the end of the experience rather than

at the beginning, this finding only emerged once people had been exposed to both sound profiles.

Ratings of participants’ first experience were unaffected by the structure of the sound profile and

showed no evidence of an end effect.

Supplemental Study 3: Single versus Repeated Positive Experiences

(Conceptual Replication of Study 4 in the Positive Domain)

This study aimed to conceptually replicate Study 4 with positive stimuli. We used two

versions of a pleasant music compilation that varied in the position of a less enjoyable segment

(as in Study 3) and presented all participants with both versions, in counterbalanced order (as in

Study 4). Similar to the results of Study 4, we expected that the position of the less enjoyable

segment would not affect participants’ rating of the first music compilation, but would affect the

rating of the second compilation: after listening to a music compilation with a mediocre middle

(ending), participants will rate a clip with a mediocre ending (middle) as less (more) enjoyable.

Method

Five hundred and two Mechanical Turk participants completed the study online in

exchange for monetary compensation.

As in study 3, participants first listened to a 10-second instrumental music clip ( “On the

Right Track”) and rated their enjoyment on a 9-point scale (1 = not at all, 9 = very much), to be

used as a covariate in the analysis. Next, participants read that they would listen to two music

compilations. Both music compilations were composed of three of the five fragments used in

Study 3: two of the very enjoyable fragments (“Herd Reunion” and “Heart Song”) as well as the

less enjoyable fragment (“Reanimator”). The fragments lasted thirty seconds each and were

tapered and integrated to create a more continuous experience, resulting in a music compilation

of 80 seconds. The two compilations differed only in the position of the less enjoyable fragment:

it was either positioned in the middle (Worse Middle) or at the end (Worse End). The order in

which participants heard each compilation was counterbalanced: half of participants heard the

Worse Middle clip first, while the other half heard the Worse End clip first.

After each compilation, participants were asked to indicate how enjoyable and pleasant it

was to listen to the music, both on 9-point scales (1 = not at all, 9 = very much). Similar to Study

2, we also added a relative preference measure after the primary measures: participants were

asked to indicate how much they enjoyed listening to the experience relative to listening to music

on the radio (9-point scale: -4 = much less than listening to the radio, 4 = much more than

listening to the radio). After participants completed these measures for the second music

compilation, they were asked to indicate which of the two music experiences they enjoyed more

(9-point scale: -4 = definitely the first experience, 4 = definitely the second experience).

As a manipulation check, we next asked participants to indicate, for each music

compilation, how the middle compared to the rest of the compilation (9-point scale: -4 = middle

was much worse, 4 = middle was much better) and how the end compared to the rest of the

compilation (9-point scale: -4 = end was much worse, 4 = end was much better). Participants

then listened to a 10-second version of the less enjoyable fragment and were asked to categorize

this fragment as either pleasant, neither pleasant nor unpleasant, or unpleasant. Finally, to verify

that participants had indeed listened to the music compilation, we asked them to listen to three

short music fragments and to identify the fragment that was part of the music compilation.

Results

Twelve people failed to correctly identify the song used in the compilation and are thus

excluded from all analyses, leaving 490 participants (MAge = 21.8, SD = 10.3; 58% male).

Manipulation checks. The majority of participants rated the less enjoyable fragment as

either unpleasant (66%) or neither pleasant nor unpleasant (22%), indicating that it indeed was

not particularly enjoyable. More important, the manipulation of the placement of this fragment

within the music clip had the intended effect on participants’ perceptions. This was true for

ratings of the first music clip: participants who listened to the Worse Middle clip first rated the

middle of the clip as worse and the end of the clip as better (MMiddle = -1.37, SD = 2.36; MEnd =

1.39, SD = 2.09) than did participants who listened to the Worse End clip first (MMiddle = 0.81, SD

= 2.32; MEnd = -1.00, SD = 2.53), FMiddle(1, 488) = 106.89, p < .001, ηp2 = .180; FEnd(1, 488) =

129.57, p < .001, ηp2 = .210. This was also true for ratings of the second music clip: participants

who listened to the Worse Middle clip second rated the middle of the clip as worse and the end

of the clip as better (MMiddle = -1.20, SD = 2.44; MEnd = 1.87, SD = 1.93) than did participants who

listened to the Worse End clip second (MMiddle = 1.41, SD = 1.97; MEnd = -1.41, SD = 2.49),

FMiddle(1, 488) = 170.02, p < .001, ηp2 = .258; FEnd(1, 488) = 265.86, p < .001, ηp

2 = .353. In short,

the manipulation was successful: participants perceived the middle of the Worse Middle clip and

the end of the Worse End clip as relatively less enjoyable.

Enjoyment. The measures of enjoyment and pleasantness were averaged to form an

enjoyment index (α clip 1 = .95, α clip 2 = .94). We conducted a repeated-measures ANOVA with

clip number (first clip, second clip) as a within-subjects factor and clip order (Worse Middle

first, Worse End first) as a between-subjects factor, adjusting for the covariate. The covariate

was a significant predictor of the enjoyment index, F(1, 487) = 70.64, p < .001, ηp2 = .127. There

were no significant main effects of clip order or clip number (both F’s < 1), but the two factors

did significantly interact: participants tended to give higher ratings to the Worse Middle clip than

to the Worse End clip, F(1, 487) = 15.40, p < .001, ηp2 = .031 (see Supplemental Figure 5 for the

pattern of means). Although this difference was reliable when they received the Worse Middle

clip first, F(1, 487) = 20.88, p < .001, 95% CI [0.18, 0.45], ηp2 = .041, it was not when they

received the Worse End clip first, F < 1, 95% CI [-0.20, 0.07]), ηp2 = .002.

Whereas these within-subject comparisons are suggestive of an end effect, between-

subjects comparisons for the first and second clip again add an important nuance to the

interpretation of these results. Consistent with the absence of an end effect in our prior studies,

the enjoyment of the first music clip did not differ between participants who listened to the

Worse End clip (M = 6.21, SD = 1.52) and those who listened to the Worse Middle clip (M =

6.27, SD = 1.52), F < 1, 95% CI [-0.34, 0.20], ηp2 < .001. Mirroring the results of Study 4, the

structure of the music clip only mattered for the second clip. Participants who listened to the

Worse End clip as their second clip rated their experience as less enjoyable (M = 5.96, SD =

1.53) than participants who instead listened to the Worse Middle clip (M = 6.27, SD = 1.54), F(1,

487) = 5.10, p = .024, 95% CI [0.04, 0.59], ηp2 = .010.

Supplemental Figure 5. Enjoyment of Music Clips by Condition (Supplemental Study 3)

First Music Clip Second Music Clip5

6

7

8

Worse End Worse Middle

Enjo

ymen

t (9-

poin

t sca

le)


Other measures. We next analyzed participants’ relative preference between listening to

the clip and listening to a song on the radio. We again conducted a repeated-measures ANOVA

with clip number as within-subjects factor and clip order as between-subjects factor. This

analysis was not adjusted for the covariate as there was an unexpected significant interaction of

the covariate with clip number, F (1, 487) = 4.94, p = .027, ηp2 = .010. There were no main

effects of number or order (both F’s < 1), but there was a significant interaction of these two

factors, F(1, 487) = 8.15, p = .004, ηp2 = .016. Consistent with the enjoyment index (and with

prior demonstrations of the end effect), participants who listened to the Bad Middle clip first

showed a greater preference for listening to the music clip (rather than a song on the radio) when

rating the first clip than when rating the second clip, F(1, 487) = 4.48, p = .035, 95% CI [0.01,

0.26], ηp2 = .009. This pattern was reversed when they received the Bad End clip first, F(1, 487)

= 3.69, p = .055, 95% CI [-0.25, 0.03], ηp2 = .008, though this latter effect was only marginally

significant. However, the between-subjects analysis of these relative preference ratings did not

show any reliable difference between people who listened to the Worse Middle clip and those

who listened to the Worse End clip, either for the first clip, F(1, 486) = 2.29, NS, 95% CI [-0.69,

0.09], ηp2 = .005 or for the second clip, F < 1, 95% CI [0.42, 0.34], ηp

2 < .001. Thus, similar to

the analysis of the enjoyment index, we did not observe an end effect for the first clip, but unlike

for the enjoyment index, we also did not observe an end effect for the second clip, suggesting

that this particular measure may not be sufficiently sensitive to provide a strong test of the end

effect.

Finally, participants’ stated preference between sound clips 1 and 2 depended on which

clip they listened to first, F(1, 198) = 144.71, p < .001, 95% CI [-4.60, -3.30], ηp2 = .421.

Participants who listened to the Worse Middle clip first were more likely to prefer the first clip

over the second one (M = -0.53, SD = 2.29), t(244) = -3.60, p < .001, 95% CI [-0.82, -0.24], ηp2 =

.050, whereas participants who listened to the Worse End clip first were more likely to prefer the

second clip over the first one (M = 0.36, SD = 2.34), t(246) = 2.40, p = .017, 95% CI [0.06,

0.65], ηp2 = .023. Again, when asked to directly compare the two clips, participants indicated an

explicit preference for a sound clip that started poorly but ended well over one that started well

but ended poorly.

Discussion

This study conceptually replicated the effect of Study 4 with positive experiences.

Consistent with prior research, participants reported enjoying the same music compilation less

when the less enjoyable segment appeared at the end, rather than in the middle. However, this

finding only held when participants were asked to directly compare the two arrangements, either

implicitly (when they were asked to evaluate the second clip after evaluating a clip that was

identical except for the position of the less enjoyable segment), or explicitly (when asked which

of the two clips they preferred). When participants simply listened to and rated the first music

compilation, their enjoyment was unaffected by the position of the less enjoyable segment—even

though participants could clearly tell that the end (or middle) of the clip was worse than the rest

of the clip, as revealed by the manipulation checks.

sample size information.supp.apa.org/.../xge0000155/zfr002162741so1.docx · web view= 6.38, 1.55),...

Documents