sample size information.supp.apa.org/.../xge0000155/zfr002162741so1.docx · web view= 6.38, 1.55),...
TRANSCRIPT
Supplemental Materials
Questioning the End Effect: Endings Are Not Inherently Over-Weighted in Retrospective
Evaluations of Experiences
by S. Tully & T. Meyvis, 2016, JEP: General
http://dx.doi.org/10.1037/xge0000155
Web AppendixContents
Sample size information..................................................................................................................2
Pretest for positive sound stimuli (Study 3)....................................................................................4
Supplemental Studies.......................................................................................................................5
Supplemental Study 1: A Better Average versus a Better End....................................................5
Supplemental Study 2: Single Versus Repeated Negative Experiences....................................12
Supplemental Study 3: Single versus Repeated Positive Experiences......................................16
Sample size information.
Study 1. An examination of past research using similar stimuli (annoying sounds that
increased or decreased in intensity; i.e., Ariely and Zauberman, 2000; Schreiber and Kahneman,
2000) found that samples typically ranged from 20-54 participants. A calculation of effect sizes
where possible demonstrated that changes in intensity resulted in effects with ƞp2 between .46
and .80. Although this would be considered a large effect size, as a conservative test, we selected
a sample of 300 participants (150 people per cell) to ensure 99% power to detect a medium effect
size (not including additional power achieved through the use of the covariate). We posted the
study online and received 303 responses.
Study 2. Participants were undergraduate students who participated for course credit. The
study was run over the course of two full semesters, utilizing all participants available to the
researchers during that time, resulting in 349 student participants.
Study 3. The pre-test of the song clips (as well as the researchers’ prior experience in
other projects) indicated that there is substantially greater variability in the enjoyment of music
than in irritation with noise. We therefore opted to double the sample size that we had chosen in
Study 1 for a total of 900 participants (300 per cell). We posted the study online and received
912 responses.
Study 4. In this study, we aimed to test the interaction of sound number (within-subjects)
and sound order (between-subjects). Assuming a moderate correlation between ratings of the two
stimuli in the repeated measures design (r = .4) and adjusting for the additional power expected
through the use of the covariate (as seen in previous studies), 200 participants provided 90%
power to detect a small effect size interaction. We posted the study online and received 204
responses.
Study 5. Participants were undergraduate students who participated for course credit. The
study was run for a full semester, utilizing all participants available to the researchers, resulting
in 303 student participants.
Study 6. This study was run in collaboration with the obstacle course racing company.
The company emailed all race participants (approximately 7,000) with a request to complete the
survey (without compensation). The response rate was slightly over 10%, yielding a total of 750
participants.
Study 7. Participants were undergraduate students who participated for course credit. The
study was run for a full semester, utilizing all participants available to the researchers, resulting
in 238 student participants.
Pretest for positive sound stimuli (Study 3).
Study 3 used three different music clips: one music clip consisting of four enjoyable
pieces of instrumental music (used in the control condition) and two music clips consisting of
those same four pieces and one additional, less enjoyable piece of instrumental music (inserted
either in the middle or at the end). To select these music fragments, we first pretested a wide
range of instrumental music fragments using a sample of 121 participants drawn from the same
population as used for the main study (Mechanical Turk). Each participant listened to a selection
of ten 30-second clips of instrumental music (out of a total set of 30 clips) and rated each clip on
a 9-point scale. Based on this pretest, we selected four clips that were enjoyed by most
participants, namely 30-second fragments from “Herd Reunion” (from the Ice Age: Continental
Drift Soundtrack, M = 6.84, SD = 1.91), “Heart Song” (performed by Gosha Mataradze, M =
6.29, SD = 2.09), Bach’s “Goldberg Variations” (M = 6.38, 1.55), and Mozart’s “Rondo Alla
Turca” (M = 6.05, SD = 2.03). We also selected one sound clip that was significantly less
enjoyable than each of the four other clips: “Reanimator” (performed by Amon Tobin, M = 4.71,
SD = 2.12), all t’s(79) > 2.92, p’s < .002. To further ensure that this last clip was less enjoyable
than the others, we increased its repetitiveness by expanding it to 45 seconds and also applied a
minor change in pitch.
Supplemental Studies.
Supplemental Study 1: A Better Average versus a Better End
(Conceptual Replication of Study 2).
This study is a conceptual replication of study 2 using a different set of aversive sound
profiles. As in study 2, the goal of this study was to examine whether the positive effect of
extending an aversive experience with a less aversive (but still negative) ending occurs because it
improves the end of the experience or because it changes the range or average intensity of the
experience. In this study, we exposed participants to one of three sound clips of an irritating
noise: (1) a clip with a less intense (and thus better) middle section (Better Middle), (2) a clip
with a less intense ending (Better End), or (3) a clip with a less intense middle section and an
additional less intense ending (Added End). The Better Middle and Better End clips had
approximately the same average intensity, but differed in the timing of the softer section. The
Added End clip consisted of the Better Middle clip with an additional, less intense extension of
the noise.
Thus, the Better Middle and Better End clips differed in the aversiveness of the ending,
but not in the average intensity of the experience, whereas the Added End clip differed from both
other clips in the average intensity of the experience. If endings are over-weighted in evaluations,
then the Better End and Added End experiences should both be perceived as less aversive than
the Better Middle experience. However, if adding a less aversive ending improves evaluations
because it reduces the average intensity (or range of intensity), then the Added End experience
should be perceived as less aversive than both the Better Middle and Better End experiences, and
there should be no difference in perceived aversiveness between the Better Middle and Better
End conditions.
Method
Two hundred and sixty undergraduate students participated in the study for either partial
course credit or monetary compensation.
Participants were seated at a desktop computer and asked to wear headphones, the
volume of which was fixed and approximately equal across computers. All participants first
listened to a short drill sound, and rated their irritation with the sound on a 101-point sliding
scale (0 = not at all irritating, 100 = very irritating). As in Study 1, this measure was included to
be used as a covariate in the analyses and thus reduce error variance due to individual differences
in aversion to annoying sounds. Next, participants completed a short filler task before continuing
with the main study.
Participants were then asked to listen to the sound of a vacuum cleaner. They listened to
one of three sound profiles, depending on condition. All three sound profiles consisted of a
vacuum noise that fluctuated in intensity. The first 50 seconds of all of the clips were identical
and oscillated in intensity (relatively high to relatively low to relatively high). Then the intensity
differed by condition. In the Better End condition, it remained at a relatively high intensity until
it tapered off to a lower intensity where it remained for the last 30 seconds. In the Better Middle
condition, the 30-second low-intensity segment followed the initial oscillation (and tapering
period) before increasing to a higher intensity for the remainder of the clip. In the Added End
condition, the Better Middle clip was extended by an additional 30-second low-intensity segment
which followed a short tapering period. Thus, the sound clips in the Better Middle and Better
End conditions differed in ending, but not in approximate average intensity1, whereas the sound
1 There was a slight difference in average intensity, which we address in the discussion.
clip in the Added End condition differed in average intensity from the clips in both other
conditions. See Supplemental Figure 1 for a visual depiction of the sound profiles.
Supplemental Figure 1. Visual depiction of sound profiles used in supplemental study 1. The height of the waveform represents the momentary intensity of the sound as a percentage of the highest intensity in the sound clip. Time is represented on the horizontal axis in seconds.
Better End:
Better Middle:
Added End:
After participants listened to the clip, they rated the extent to which they found the
experience of listening to the sound annoying (9-point scale: 1 = mildly annoying, 9 = extremely
annoying), unpleasant (9-point scale: 1 = mildly unpleasant, 9 = very unpleasant), or irritating
(measured on the same scale as the covariate: a 101-point slider scale anchored by: 0 = mildly
irritating, 100 = extremely irritating).
After the primary dependent measures were collected, participants were asked to again
listen to the drill sound that they listened to at the start of the study, and then indicated whether
this experience was more or less irritating than listening to the vacuum sound (9-point scale: 1 =
much less irritating, 9 = much more irritating). Participants then rated the volume of the vacuum
sound (9-point scale: 1 = very quiet, 9 = very loud). Next, participants indicated how much
money, out of $10, they would give back to avoid repeating the experience, and how long (in
seconds) they believed the experience lasted. These four additional measures were included to
test whether, if the end effect would again not obtain on scale measures of the subjective
experience, it might instead manifest on alternative measures: a relative preference measure
(which avoids scaling effects), an evaluation of the objective experience (volume), valuation, or
a downstream effect (on time perception).
To verify that participants had noted the volume at the end of the clip, they were asked to
indicate how the end of the experience compared to the rest of the experience (by selecting one
of three options: the end was quieter, the end was about the same, the end was louder).
Finally, participants provided demographic information and completed an Instructional
Manipulation Check (Oppenheimer, Meyvis, & Davidenko, 2009), which consisted of a
paragraph of text explaining the importance of reading instructions and asking participants to
choose “none of the above” from a marital status dropdown list.
Results
Thirty-five people failed the Instructional Manipulation Check, leaving a sample of 224
participants (MAge = 20.2, SD = 2.17; 44.2% male).
Manipulation check. Participants were more likely to indicate that the end was quieter
than the rest of the sound clip in the Better End condition (P = 60.1%) than in the Better Middle
condition (P = 31.5%), χ2 (1) = 12.82, p < .001, Cohen’s d = .61, indicating that the manipulation
of the ending was successful. Participants in the Added End condition were also more likely to
indicate that the end was quieter (P = 45.3%) than were participants in the Better Middle
condition, but this effect was only marginally significant, χ2 (1) = 2.95, p = .086, Cohen’s d = .29.
It is possible that because this clip was longer, the perception of the end extended beyond the
final low-intensity segment, thus reducing the perceived difference. In addition, it is also possible
that, because the Added End had a lower average intensity, the intensity of the end of this clip
was not as different from the average as it was in the Better End condition.
Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were
standardized and combined to form an aversiveness index (α = .93). As in the studies in the main
paper, we analyzed this index while controlling for the aversiveness covariate (the rating of the
drill sound at the start of the study) to increase the power of the tests and to control for any
variation in volume across computers. The covariate was a significant predictor of aversiveness
ratings, F(1, 220) = 75.38, p < .001, ηp2 = .255. The overall effect of condition did not reach
significance, F(1, 220) = 2.25, p = .108, ηp2 = .020, but planned contrasts support our hypotheses.
First, we tested the end effect by comparing the Better Middle and Better End conditions, which
differed in ending, but not in average intensity. This planned contrast showed that the experience
was not perceived as less aversive in the Better Middle condition (M = 0.10, SD = 0.81) than in
the Better End condition (M = 0.06, SD = 0.82), F < 1, 95% CI [-0.31, 0.22], ηp2 < .001.2 Thus,
we did not find evidence of an end effect. Next, we tested whether adding a better end (rather
than moving the better part to the end) changes the perceived aversiveness of the experience, by
comparing the Added End condition to the other two conditions, both of which had a higher
average intensity. A planned contrast confirmed that the experience was perceived as less
aversive in the Added End condition (M = -0.16, SD = 0.81) than in the other two conditions,
F(1, 220) = 4.43, p = .036, 95% CI [-0.94, -0.03], ηp2 = .020. Thus, while we again did not
replicate the end effect, we did replicate the prior finding that extending a negative experience
2 Note that the means presented in the results are the adjusted means (adjusted for the covariate). The unadjusted means are: M (Better End) = 0.11, M (Better Middle) = 0.00, M (Added End) = -0.10.
with a less aversive ending reduces the overall aversiveness of the experience (in spite of adding
negative utility).
Supplemental Figure 2. Perceived Aversiveness Ratings by Condition (Supplemental Study 1)
Added End Better Middle Better End-0.3
-0.2
-0.1
5.55111512312578E-17
0.1
0.2
0.3
Aver
siven
ess I
ndex
Note: Error bars denote standard errors.
Other measures. The relative irritation measure (asking participants to rate their
irritation from the vacuum noise relative to the drill noise) and the perceived volume question
were not affected by the manipulations: neither the overall effect of condition, nor the planned
contrasts were reliable (all F’s < 1.88, NS). However, the pattern of results for the question
asking participants their willingness to pay to avoid repeating the experience replicated that for
the aversiveness index. There was a marginally significant effect of condition, F(2, 220) = 2.63,
p = .075, ηp2 = .012. There was no difference in willingness to pay between the Better End
condition (M = $1.56, SD = $2.52) and the Better Middle condition (M = $1.60, SD = $3.37), F
< 1, NS, 95% CI [-0.85, 0.86], ηp2 < .001. However, willingness to pay was significantly lower in
the Added End condition (M = $0.78, SD = $1.92) than in the other two conditions, F(1, 220) =
5.25, p = .023, 95% CI [-3.20, -0.24], ηp2 = .023.
Finally, there was an overall effect of condition on estimates of duration, F(2, 220) =
9.94, p < .001, ηp2 = .083. Participants in the Added End condition provided higher estimates of
clip duration (M = 155 secs, SD = 75.21) than those in the Better End condition (M = 117 secs,
SD = 75.24) or the Better Middle condition (M = 101.09, SD = 75.00), F(1, 220) = 18.56, p
< .001, 95% CI [49.87, 133.98], ηp2 = .078, which was consistent with the actual longer duration
of the clip in that condition. Estimated duration did not differ between the latter two conditions,
F(1, 220) = 1.63, NS, 95% CI [-8.58, 40.03], ηp2 = .007.
Discussion
Moving the less aversive part of an irritating noise to the end versus the middle did not
affect the perceived aversiveness of the experience, casting further doubt on the notion that
endings are inherently over-weighted in evaluations of experiences. However, extending the
irritating noise with an additional, less aversive part did lead participants to perceive the overall
experience as less aversive. Thus, this study replicates prior findings of the beneficial effects of
“adding a better end,” but also indicates that this effect is not driven by a disproportionate impact
of the end, but rather by another processes, such as lowering the average intensity of the
experience. In addition, this study also argues against a scaling effect interpretation of the
findings (e.g., the end changes the meaning of the rating scale) since the findings from the
primary dependent measure were replicated with a monetary value measure.
A potential limitation of this study is that the average intensity was in fact slightly lower
in the Better Middle condition than in the Better End condition. This slight difference was due to
a gradual transition to the final volume in the Better Middle condition—which was used to avoid
a jarring sound increase. While it is possible that the difference in the average intensity reduced
the potential to find an end effect, this difference was only minimal (compared to the difference
in average intensity with the Added End condition) and the findings were conceptually replicated
in study 2, as well as (in the positive domain) in study 3.
Supplemental Study 2: Single Versus Repeated Negative Experiences
(Conceptual Replication of Study 4)
This study is a conceptual replication of study 4 using a different set of aversive sounds
and conducted in the lab rather than online. Thus, this study also examined the impact of singular
versus repeated experiences. As in study 4, each participant was sequentially presented with two
clips of aversive sound, one that started well, but ended poorly (Worse End) and one that started
poorly, but ended well (Better End). We expected that the position of the less aversive segment
would not affect participants’ rating of the first sound clip, but would affect the rating of the
second sound clip. That is, after listening to a sound clip with a worse (better) end, participants
will rate a clip with a better (worse) end as less (more) aversive.
Method
One hundred and sixty-four undergraduate students participated in the study in exchange
for partial course credit.
The procedure was similar to that of study 4. However, instead of asking participants to
calibrate the volume settings after listening to a sample sound, the volume settings were fixed
and approximately equal across computers. As in the other studies, participants listened to and
rated their irritation with a drill sound clip (to be used as covariate). Participants then listened to
one of two versions of the main stimulus (100 seconds of vacuum cleaner noise). Each clip
consisted of 70 seconds at a relatively higher intensity and 30 seconds at a relatively lower
intensity. The two sound clips were identical but reversed such that the lower intensity segment
was positioned at the end of one clip (Better End) and at the beginning of the other clip (Worse
End). See Supplemental Figure 3 for a visual depiction of the sound stimuli. Note that these
sound clips are identical to the Added End and Added Beginning sound clips used in Study 2 of
the paper. Immediately after listening to one of the sound clips, participants rated how annoying,
unpleasant, and irritating it was to listen to the clip (on 9-point scales: 1 = not at all, 9 = very).
Then, as in Study 4, they listened to the other clip and rated this second clip on the same three
dependent measures. Unlike the studies in the main paper, this study did not have any additional
measures or any manipulation checks. Finally, participants provided demographic information
and completed an Instructional Manipulation Check, which consisted of a paragraph of text
explaining the importance of reading instructions and asking participants to ignore the question
underneath (a question about their geographical location with a list of regions). Participants were
asked to write “none” instead of selecting a region.
Supplemental Figure 3. Visual Depiction of Sound Profiles Used in Supplemental Study 2.
Better End:
Worse End:
Note: The height of the waveform represents the momentary intensity of the sound as a percentage of the highest intensity in the sound clip. Time is represented on the horizontal axis in seconds.
Results
Fifty-five participants failed the Instructional Manipulation Check. Due to this
unforeseen high failure rate (34% of all participants), we concluded that this was an overly
selective instructional check. Rather than removing 34% of all participants from the analysis, we
decided to not remove any participants from the analysis. Removal of these participants does not
change the pattern of results. The full sample consisted of 164 participants (MAge = 20.5, SD =
3.0; 46% male).
Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were
combined to form an aversiveness index for each sound clip (α clip 1 = .95, α clip 2 = .96). We
conducted a repeated-measures ANOVA with sound number (first sound, second sound) as a
within-subjects factor and sound order (Worse End first, Better End first) as a between-subjects
factor, adjusting for the covariate. The covariate was a significant predictor of aversiveness
ratings, F(1, 161) = 55.50, p < .001, ηp2 = .256. There was no main effect of sound order, F(1,
161) = 1.15, NS, ηp2 = .007, or of sound number, F < 1, ηp
2 = 0.001. However, as predicted there
was a significant interaction of these two factors, F(1, 161) = 20.41, p < .001, ηp2 = .112 (see
Supplemental Figure 4). Specifically, participants perceived the first sound as more aversive than
the second sound when they listened to the Worse End first, F(1, 161) = 8.66, p = .004, 95% CI
[0.10, 0.49], ηp2 = .051), but perceived the first sound as less aversive than the second sound
when they listened to the Better End first, F(1, 161) = 12.04, p = .001, 95% CI [-0.55, -0.15], ηp2
= .070. Thus, consistent with prior research, participants perceived their Better End experience as
less aversive than their Worse End experience.
However, as in Study 4, the between-subjects comparisons for the first and second sound
add an important nuance to the interpretation of these results. Once again, ratings of the first
sound were unaffected by the position of the better segment. Participants who listened to the
Better End clip first (M = 6.82, SD = 1.85) did not perceive the sound as less aversive than those
who listened to the Worse End clip first (M = 6.84, SD = 1.85), F < 1, 95% CI [-0.56, 0.59], ηp2 <
.001. It was only for the second sound that the position of the better segment mattered.
Participants who listened to the Better End clip second rated that sound as less aversive (M =
6.54, SD = 1.87) than did those who listened to the Worse End clip second (M = 7.17, SD =
1.87), F(1, 161) = 4.50, p = .035, 95% CI [-1.21, -0.04], ηp2 = .027. These results provide further
evidence that the end effect does not obtain in evaluations of a single experience.
Supplemental Figure 4. Perceived Aversiveness of Sound Clips by Sound Number
(Supplemental Study 2)
First Sound Clip Second Sound Clip4
5
6
7
8
9
Worse End Better End
Aver
siven
ess (
9-po
int s
cale
)
Note: Error bars denote standard errors.
Discussion
This study conceptually replicated the results of Study 4 with a different set of sound
profiles. Further, this study was run in a laboratory setting, which implies that the volume
settings were more closely regulated and the experiment was overseen by a research assistant.
Although within-subjects comparisons show that participants perceived the same sound stimulus
as less aversive when the better segment was positioned at the end of the experience rather than
at the beginning, this finding only emerged once people had been exposed to both sound profiles.
Ratings of participants’ first experience were unaffected by the structure of the sound profile and
showed no evidence of an end effect.
Supplemental Study 3: Single versus Repeated Positive Experiences
(Conceptual Replication of Study 4 in the Positive Domain)
This study aimed to conceptually replicate Study 4 with positive stimuli. We used two
versions of a pleasant music compilation that varied in the position of a less enjoyable segment
(as in Study 3) and presented all participants with both versions, in counterbalanced order (as in
Study 4). Similar to the results of Study 4, we expected that the position of the less enjoyable
segment would not affect participants’ rating of the first music compilation, but would affect the
rating of the second compilation: after listening to a music compilation with a mediocre middle
(ending), participants will rate a clip with a mediocre ending (middle) as less (more) enjoyable.
Method
Five hundred and two Mechanical Turk participants completed the study online in
exchange for monetary compensation.
As in study 3, participants first listened to a 10-second instrumental music clip ( “On the
Right Track”) and rated their enjoyment on a 9-point scale (1 = not at all, 9 = very much), to be
used as a covariate in the analysis. Next, participants read that they would listen to two music
compilations. Both music compilations were composed of three of the five fragments used in
Study 3: two of the very enjoyable fragments (“Herd Reunion” and “Heart Song”) as well as the
less enjoyable fragment (“Reanimator”). The fragments lasted thirty seconds each and were
tapered and integrated to create a more continuous experience, resulting in a music compilation
of 80 seconds. The two compilations differed only in the position of the less enjoyable fragment:
it was either positioned in the middle (Worse Middle) or at the end (Worse End). The order in
which participants heard each compilation was counterbalanced: half of participants heard the
Worse Middle clip first, while the other half heard the Worse End clip first.
After each compilation, participants were asked to indicate how enjoyable and pleasant it
was to listen to the music, both on 9-point scales (1 = not at all, 9 = very much). Similar to Study
2, we also added a relative preference measure after the primary measures: participants were
asked to indicate how much they enjoyed listening to the experience relative to listening to music
on the radio (9-point scale: -4 = much less than listening to the radio, 4 = much more than
listening to the radio). After participants completed these measures for the second music
compilation, they were asked to indicate which of the two music experiences they enjoyed more
(9-point scale: -4 = definitely the first experience, 4 = definitely the second experience).
As a manipulation check, we next asked participants to indicate, for each music
compilation, how the middle compared to the rest of the compilation (9-point scale: -4 = middle
was much worse, 4 = middle was much better) and how the end compared to the rest of the
compilation (9-point scale: -4 = end was much worse, 4 = end was much better). Participants
then listened to a 10-second version of the less enjoyable fragment and were asked to categorize
this fragment as either pleasant, neither pleasant nor unpleasant, or unpleasant. Finally, to verify
that participants had indeed listened to the music compilation, we asked them to listen to three
short music fragments and to identify the fragment that was part of the music compilation.
Results
Twelve people failed to correctly identify the song used in the compilation and are thus
excluded from all analyses, leaving 490 participants (MAge = 21.8, SD = 10.3; 58% male).
Manipulation checks. The majority of participants rated the less enjoyable fragment as
either unpleasant (66%) or neither pleasant nor unpleasant (22%), indicating that it indeed was
not particularly enjoyable. More important, the manipulation of the placement of this fragment
within the music clip had the intended effect on participants’ perceptions. This was true for
ratings of the first music clip: participants who listened to the Worse Middle clip first rated the
middle of the clip as worse and the end of the clip as better (MMiddle = -1.37, SD = 2.36; MEnd =
1.39, SD = 2.09) than did participants who listened to the Worse End clip first (MMiddle = 0.81, SD
= 2.32; MEnd = -1.00, SD = 2.53), FMiddle(1, 488) = 106.89, p < .001, ηp2 = .180; FEnd(1, 488) =
129.57, p < .001, ηp2 = .210. This was also true for ratings of the second music clip: participants
who listened to the Worse Middle clip second rated the middle of the clip as worse and the end
of the clip as better (MMiddle = -1.20, SD = 2.44; MEnd = 1.87, SD = 1.93) than did participants who
listened to the Worse End clip second (MMiddle = 1.41, SD = 1.97; MEnd = -1.41, SD = 2.49),
FMiddle(1, 488) = 170.02, p < .001, ηp2 = .258; FEnd(1, 488) = 265.86, p < .001, ηp
2 = .353. In short,
the manipulation was successful: participants perceived the middle of the Worse Middle clip and
the end of the Worse End clip as relatively less enjoyable.
Enjoyment. The measures of enjoyment and pleasantness were averaged to form an
enjoyment index (α clip 1 = .95, α clip 2 = .94). We conducted a repeated-measures ANOVA with
clip number (first clip, second clip) as a within-subjects factor and clip order (Worse Middle
first, Worse End first) as a between-subjects factor, adjusting for the covariate. The covariate
was a significant predictor of the enjoyment index, F(1, 487) = 70.64, p < .001, ηp2 = .127. There
were no significant main effects of clip order or clip number (both F’s < 1), but the two factors
did significantly interact: participants tended to give higher ratings to the Worse Middle clip than
to the Worse End clip, F(1, 487) = 15.40, p < .001, ηp2 = .031 (see Supplemental Figure 5 for the
pattern of means). Although this difference was reliable when they received the Worse Middle
clip first, F(1, 487) = 20.88, p < .001, 95% CI [0.18, 0.45], ηp2 = .041, it was not when they
received the Worse End clip first, F < 1, 95% CI [-0.20, 0.07]), ηp2 = .002.
Whereas these within-subject comparisons are suggestive of an end effect, between-
subjects comparisons for the first and second clip again add an important nuance to the
interpretation of these results. Consistent with the absence of an end effect in our prior studies,
the enjoyment of the first music clip did not differ between participants who listened to the
Worse End clip (M = 6.21, SD = 1.52) and those who listened to the Worse Middle clip (M =
6.27, SD = 1.52), F < 1, 95% CI [-0.34, 0.20], ηp2 < .001. Mirroring the results of Study 4, the
structure of the music clip only mattered for the second clip. Participants who listened to the
Worse End clip as their second clip rated their experience as less enjoyable (M = 5.96, SD =
1.53) than participants who instead listened to the Worse Middle clip (M = 6.27, SD = 1.54), F(1,
487) = 5.10, p = .024, 95% CI [0.04, 0.59], ηp2 = .010.
Supplemental Figure 5. Enjoyment of Music Clips by Condition (Supplemental Study 3)
First Music Clip Second Music Clip5
6
7
8
Worse End Worse Middle
Enjo
ymen
t (9-
poin
t sca
le)
Note: Error bars denote standard errors.
Other measures. We next analyzed participants’ relative preference between listening to
the clip and listening to a song on the radio. We again conducted a repeated-measures ANOVA
with clip number as within-subjects factor and clip order as between-subjects factor. This
analysis was not adjusted for the covariate as there was an unexpected significant interaction of
the covariate with clip number, F (1, 487) = 4.94, p = .027, ηp2 = .010. There were no main
effects of number or order (both F’s < 1), but there was a significant interaction of these two
factors, F(1, 487) = 8.15, p = .004, ηp2 = .016. Consistent with the enjoyment index (and with
prior demonstrations of the end effect), participants who listened to the Bad Middle clip first
showed a greater preference for listening to the music clip (rather than a song on the radio) when
rating the first clip than when rating the second clip, F(1, 487) = 4.48, p = .035, 95% CI [0.01,
0.26], ηp2 = .009. This pattern was reversed when they received the Bad End clip first, F(1, 487)
= 3.69, p = .055, 95% CI [-0.25, 0.03], ηp2 = .008, though this latter effect was only marginally
significant. However, the between-subjects analysis of these relative preference ratings did not
show any reliable difference between people who listened to the Worse Middle clip and those
who listened to the Worse End clip, either for the first clip, F(1, 486) = 2.29, NS, 95% CI [-0.69,
0.09], ηp2 = .005 or for the second clip, F < 1, 95% CI [0.42, 0.34], ηp
2 < .001. Thus, similar to
the analysis of the enjoyment index, we did not observe an end effect for the first clip, but unlike
for the enjoyment index, we also did not observe an end effect for the second clip, suggesting
that this particular measure may not be sufficiently sensitive to provide a strong test of the end
effect.
Finally, participants’ stated preference between sound clips 1 and 2 depended on which
clip they listened to first, F(1, 198) = 144.71, p < .001, 95% CI [-4.60, -3.30], ηp2 = .421.
Participants who listened to the Worse Middle clip first were more likely to prefer the first clip
over the second one (M = -0.53, SD = 2.29), t(244) = -3.60, p < .001, 95% CI [-0.82, -0.24], ηp2 =
.050, whereas participants who listened to the Worse End clip first were more likely to prefer the
second clip over the first one (M = 0.36, SD = 2.34), t(246) = 2.40, p = .017, 95% CI [0.06,
0.65], ηp2 = .023. Again, when asked to directly compare the two clips, participants indicated an
explicit preference for a sound clip that started poorly but ended well over one that started well
but ended poorly.
Discussion
This study conceptually replicated the effect of Study 4 with positive experiences.
Consistent with prior research, participants reported enjoying the same music compilation less
when the less enjoyable segment appeared at the end, rather than in the middle. However, this
finding only held when participants were asked to directly compare the two arrangements, either
implicitly (when they were asked to evaluate the second clip after evaluating a clip that was
identical except for the position of the less enjoyable segment), or explicitly (when asked which
of the two clips they preferred). When participants simply listened to and rated the first music
compilation, their enjoyment was unaffected by the position of the less enjoyable segment—even
though participants could clearly tell that the end (or middle) of the clip was worse than the rest
of the clip, as revealed by the manipulation checks.