the effects of addressee attention on prosodic prominence
TRANSCRIPT
This article was downloaded by: [University North Carolina - Chapel Hill]On: 15 May 2013, At: 05:49Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Language and Cognitive ProcessesPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/plcp20
The effects of addressee attention on prosodicprominenceElise C. Rosa a , Kayla H. Finch a , Molly Bergeson a & Jennifer E. Arnold aa Department of Psychology, University of North Carolina at Chapel Hill, Davie Hall CB#3270, Chapel Hill, NC, 27516, USAPublished online: 05 Apr 2013.
To cite this article: Elise C. Rosa , Kayla H. Finch , Molly Bergeson & Jennifer E. Arnold (2013): The effects of addresseeattention on prosodic prominence, Language and Cognitive Processes, DOI:10.1080/01690965.2013.772213
To link to this article: http://dx.doi.org/10.1080/01690965.2013.772213
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form toanyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses shouldbe independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly inconnection with or arising out of the use of this material.
The effects of addressee attention on prosodic prominenceElise C. Rosa*, Kayla H. Finch, Molly Bergeson and Jennifer E. Arnold
Department of Psychology, University of North Carolina at Chapel Hill, Davie Hall CB #3270,
Chapel Hill, NC 27516, USA
(Received 30 March 2012; final version received 24 January 2013)
How do speakers accommodate distracted listeners? Specifically, how does prosody change when speakers know thattheir addressees are multitasking? Speakers might use more acoustically prominent words for distracted addressees,to ensure that important information is communicated. Alternatively, speakers might disengage from the task anduse less prominent pronunciations with distracted addressees. A further question is whether prosodic prominencechanges globally or if there are effects specific to the most relevant information. We studied these effects in twoinstruction-giving experiments. Speakers instructed listeners to move objects to locations on a board. In thedistraction condition, addressees were also completing a demanding secondary computer task; in the attentivecondition they paid full attention. Results demonstrated that speakers modify their speech for distracted listeners,and in an instruction-giving task they specifically use more acoustically prominent (longer) pronunciations fordistracted listeners. This effect was localised to the most task-relevant information: the object to be moved.
Keywords: prosody; attention; audience design
Speakers have numerous choices to make for every
message they want to communicate. They can be
concise (Crackers please!) or verbose (Can you please
hand me that box of crackers?). They can specify
objects with detail (That box of saltines next to you)
or not (that). They can enunciate words prominently
or with a reduced pronunciation. Many of these choices
are related to the information being communicated.
Already-known or predictable information is generally
expressed with fewer words, less detail and reduced
pronunciation, whereas new or important information
is referred to with more words, more detail and
acoustically prominent forms (Arnold, 1998, 2008,
2010; Brown, 1983; Chafe, 1976; Gundel, Hedberg, &
Zacharski, 1993; Halliday, 1967; Sityaev, 2000). A
much-debated issue is whether these choices are made
as a result of the speaker’s knowledge about their
addressee’s knowledge or attention � a process known
as audience design (Arnold, 2008; Arnold, Kahn, &
Pancani, 2012; Galati & Brennan, 2010; Horton &
Keysar, 1996).
In this paper we ask whether people speak differ-
ently when their addressee is distracted. This provides a
window onto questions about whether audience design
affects speech, since distracted behaviour reflects the
addressee’s attentional state. For example, if your
request for crackers is directed at someone engaged in
a different task, like driving a car, how will your word
choice and pronunciation be affected? There are a lot
of dimensions on which you might change your
prosody � you might speak the whole sentence more
slowly, or loudly, or you might speak only particular
words more slowly. We focus here on how speakers
modify the acoustic prominence of their words, speci-
fically word duration, but also examine how it co-
occurs with other types of linguistic form variation.
Duration is especially interesting because it may vary
as a function of the speaker’s desire to make certain
words prominent (Breen, Fedorenko, Wagner, & Gib-
son, 2010; Ladd, 1996), but also can provide a cue
about the speaker’s fluency (Bell et al., 2003), which in
turn can affect comprehension (e.g., Arnold, Hudson
Kam, & Tanenhaus, 2007; Arnold, Tanenhaus, Alt-
mann, & Fagnano, 2004).
It is well established that speakers use language
differently for different addressees (e.g., Clark, 1996;
Clark & Krych, 2004; Galati & Brennan, 2010), and
there is good evidence that audience design impacts
lexical choices (Brennan & Clark, 1996; Brown-
Schmidt & Tanenhaus, 2006; Heller, Gorman, &
Tanenhaus, 2012; Horton & Keysar, 1996; Gorman,
Gegg-Harrison, Marsh, & Tanenhaus, 2012). Speakers
refer to objects in conversation using partner-specific
terms they have developed over the course of conversa-
tion. They also keep track of the physical presence of
objects they are referring to for themselves and their
conversational partners.
*Corresponding author. Email: [email protected]
Language and Cognitive Processes, 2013
http://dx.doi.org/10.1080/01690965.2013.772213
# 2013 Taylor & Francis
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3
However, an ongoing debate concerns the effect of
audience design on acoustic prominence. Some the-
ories suggest that audience design is the primary
determinant of the speaker’s choice to acoustically
emphasise some words (Chafe, 1987; Lindblom, 1990).
This account is consistent with the idea that new and
unpredictable information tends to be accented (e.g.,
Venditti & Hirschberg, 2003), since this information
should be less accessible to listeners, and thus require
more explicit input. However, a strong version of thisaccount has found little support in empirical studies
where the speaker and addressee’s knowledge are
examined separately. For example, Bard and collea-
gues (Bard, Anderson, Aylett, Doherty-Sneddon, &
Newlands, 2000; Bard & Aylett, 2005) found that
intelligibility was unaffected by numerous measures of
the listener’s knowledge. They proposed the dual
process hypothesis (Bard et al., 2000), in which fast
automatic processes allow for the speaker’s memory to
affect articulation, and slower processes incorporate
information about the listener for purposes likechoosing pronominal forms. This work showed that
acoustic reduction is not primarily driven by audience
design, although it did not test the possibility that
addressee knowledge has a partial constraint. Kahn
and Arnold (2012), under review) similarly found that
speakers shortened nouns that they had recently
heard, and moreover that the degree of shortening
was unaffected by whether the addressee had also
heard the word or not.
In contrast, Galati and Brennan (2010) found thatwords directed at knowledgeable addressees were rated
as less intelligible than those directed at naı̈ve addres-
sees, even though they did not differ on duration.
Arnold et al. (2012) found that speakers in their
experiment did modulate the duration of words in
response to addressee behaviour, but specifically on a
word associated with utterance planning � the deter-
miner the (Clark & Wasow, 1998). This suggested that
effects of audience design may be mediated by produc-
tion-internal processes of utterance planning.Whether audience design affects acoustic variation
or not, there is abundant evidence that speakers use
longer and more acoustically prominent pronuncia-
tions for information that is harder to retrieve or plan
and shorter pronunciations for easy-to-produce words
and referents (Balota & Chumbley, 1985; Bard et al.,
2000; Bell, Brenier, Gregory, Girand, & Jurafsky, 2009;
Clark & Fox Tree, 2002; Lam & Watson, 2010; Arnold
& Watson, 2012, under review; Kahn & Arnold, 2012).
For example, when speakers are disfluent, saying um,
uh or repeating words, it indicates that they are havingspeech production difficulty. Words surrounding such
disfluent elements also tend to be longer (Bell et al.,
2003).
In sum, previous work suggests that speakers
accommodate their listeners’ needs in many ways, but
effects of audience design on acoustic variation are
variable. However, the majority of work on this
question has focused on whether speakers adjust their
pronunciations in response to their addressee’s knowl-
edge. The current work instead examines the effects of
the listener’s attentional state. Do speakers modulate
the acoustic properties of their speech in response to
visible evidence that their addressee is distracted?Distraction is a common characteristic of day-to-
day life, yet relatively little is known about how
speakers adjust their linguistic form when speaking to
distracted addressees. In a narrative recall study,
Pasupathi, Stallworth, and Murdoch (1998) found
that speakers with attentive addressees produced
more information than those with distracted addres-
sees. Similarly, Kuhlen and Brennan (2010) found that
speakers told narrative jokes with more detail with
attentive rather than distracted addresses, but this
effect went away when the speaker expected theaddressee to be distracted. Thus, these studies found
that speakers provided less information to distracted
addressees.
By contrast, a study by Arnold et al. (2012) suggests
that speakers provide more information for less-atten-
tive addressees. Speakers gave pairs of instructions to
addressees to place objects on a board of coloured dots,
e.g., The cat goes on red. The teapot goes on yellow. In
one condition the addressee simply followed each
instruction after hearing it, showing a normal level of
attention. In another condition the addressee wasespecially attentive, anticipating the second object. In
this condition the addressee picked up both objects
after the first instruction, rather than waiting for the
second instruction to pick up the second item. Speakers
both used more words and longer pronunciations of the
word the with non-anticipating addresses. Unlike the
narrative tasks in which distracted addressees elicited
less detail, this task required the addressee to follow
instructions, so increased verbal specificity may have
had a concrete advantage for completing the task.
Similar results come from a narrative production studyby Rosa and Arnold (2011), except that they found that
speakers provided more explicit referring expressions
when they themselves were distracted.
The current study used the same instruction-follow-
ing task as Arnold et al. (2012) to examine the effects of
addressee attention on speakers’ prosody. In contrast
to Arnold et al. (2012), in which addressees anticipated
speakers’ instructions, addressees in the current study
were distracted with a secondary task. If speakers use
acoustic prominence to ensure effective communica-
tion, we would expect to see longer words withdistracted addressees. If such an effect is driven by
2 E.C. Rosa et al.
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3
the comprehension needs of the listener, we would
expect increased prominence to be localised to the most
central information for the task, that is, the word
describing the target object. Alternatively, if speakers
engage more with attentive addressees, they might
decrease acoustic prominence for distracted addressees,
as found in narrative recall tasks for lexical detail
(Kuhlen & Brennan, 2010; Pasupathi et al., 1998). A
third possibility is that addressee distraction may have
no effect on speakers’ acoustic choices, as predicted by
other studies that have found no effect of audiencedesign on acoustic reduction (Bard & Aylett, 2005;
Kahn & Arnold, 2012, under review).
A second question was whether effects of addressee
distraction would interact with known informational
predictors of acoustic prominence. One well-known
determinant of word duration is predictability: Pre-
dictable words tend to be acoustically reduced (Bell et
al., 2009; Gahl & Garnsey, 2004; Jurafsky, Bell,
Gregory, & Raymond, 2001), where predictability can
stem from the surrounding words, the prior sentencemeaning, or syntactic structure. Likewise, when the
discourse context leads to an expectation of a specific
referent, words referring to it tend to be reduced
(Arnold, 1998, 2001; Lam & Watson, 2010; Watson,
Arnold, & Tanenhaus, 2008). If speakers think that
distracted addressees cannot follow predictability cues
effectively, they may resist the usual tendency to reduce
predictable information, and thereby show greater
acoustic prominence specifically for predictable words.
Alternatively, they may use a simpler strategy of
adjusting their speech for distracted addressees overall,
regardless of predictability.We therefore examined the effects of addressee
distraction in two experiments. Both used the same
instruction-giving task. Each trial involved two objects,
and speakers always produced one instruction for each
object. In Experiment 1 the target object was the
second item in the pair, meaning that after the first
instruction was given, the object in the second instruc-
tion was fully predictable. In Experiment 2, the same
target objects were used as the first instruction, so they
were relatively less predictable. Word duration was themain variable of concern in this study, but the effects of
predictability and distraction on lexical choices were
also examined. Latency to begin speaking was also
analyzed, as latency can be seen as a measure of
planning, and addressee behaviour may also affect
speakers’ planning processes.
One of the advantages of this experimental para-
digm is that it involved a concrete task, which provided
the speaker with motivation to accommodate the
addressee. Another advantage to this task is that it
imposed little to no memory burden on the subject, incontrast to other studies that used either maps or
narratives that only the speaker had viewed (Bard &
Aylett, 2005; Bard et al., 2000). As this added burden
of ‘record-keeping’ was reduced in our task, speakers
presumably had more resources with which to complete
the task, and therefore might be more capable of
considering their listeners in planning their utterances.
Additionally, partner-specific findings or audience de-
sign effects are most likely to occur in an interactivedialogue setting (Brown-Schmidt, 2009).
Experiments 1 and 2
We tested how speakers would modify their speech in
reaction to the listener’s state of distraction, in two
experiments. The methods and analyses were nearly
identical across the two experiments, so they are
reported together.
Method
Participants
Twenty undergraduate students from the University
of North Carolina participated, 10 in Experiment 1 and10 in Experiment 2. All participants were native
speakers of English, and had normal or corrected-to-
normal vision. Participants received course credit for
their participation.
Materials and design
Target stimuli consisted of 48 physical objects whose
names were matched for number of phonemes, syllables
and frequency. The same target stimuli were used in
Experiments 1 and 2. Filler stimuli (i.e., those objects
used for the other instruction) were all one syllable, andwere the same across experiments. Two lists were
formed for each experiment, as participants worked
with one attentive addressee and one distracted ad-
dressee. The lists were paired by length, phonemes and
frequency as closely as possible, and each list occurred
equally in each condition. Targets were presented to
each participant once, either as the second item in a
pair (Experiment 1, predictable targets), or as the firstitem in a pair (Experiment 2, unpredictable targets).
Different study participants performed the two experi-
ments. Thus, there were 20 participants in total with 48
trials each. The order in which addressees (distracted,
attentive) were encountered was counterbalanced
across participants to provide a control of any carry-
over effect between first and second blocks.
Equipment
Stimuli were presented on a computer monitor in a
slide-show format, using Powerpoint. The objects to bemoved were stored in containers, and were put on the
Language and Cognitive Processes 3
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3
table in pairs. Responses were recorded using a headset
microphone.
Procedure
Participants worked with two confederates during
each of the experiments. For generalisability of the
results, we used two confederates, and confederate was
entered as a control predictor into the statistical model.
Participants were told that the confederates were
members of the lab. Participants were instructed to
give instructions to the confederates to move objects to
the coloured circles on the board on the table in
between them.
During half of the trials the participant was paired
with a distracted confederate, who was performing a
secondary computer task while completing the primary
task. The participant was informed that the confeder-
ate had to perform this task. Confederates and
participants could see each other, and confederates
did not speak unless clarification was necessary. The
secondary task was a timed state-labelling game that
required the confederate to be continuously engaged,
except when pausing to carry out the instructions.
During the other half of the trials the participants
worked with a confederate who was not performing any
secondary task. The order of distraction was counter-
balanced between participants. Given the number of
trials and the counterbalancing of order, confederates
had real communication needs, as it was not possible to
remember every item and its location.
The primary task was an instruction-giving task. No
practice trials were given. Once the experiment began,
participants would see pictures of two objects per trial
appear, one at a time, on a computer screen behind the
confederate, indicating the intended destination. The
objects were on coloured circles on the screen. The two
objects to be moved were placed on the table, and the
participant instructed the confederate to move the first
object to the appropriate coloured circle, as shown on
the computer screen. The second object would then
appear on the screen, and the participant again
instructed the confederate to place the object on the
table on the depicted colour. In Experiment 1 the target
item was the second object to appear, making it entirely
predictable. In Experiment 2 the target item object was
the first to appear on the screen, making it relatively
unpredictable to the confederate, who could not view
the screen. Participants issued verbal instructions to
confederates to move the objects, for example, ‘Put the
fox on the green circle. Now put the cork on the red
circle’. As soon as confederates moved the second
object of the pair, the computer screen was advanced to
the next trial. Participants were not given a verbal
example of how to instruct the confederates. The only
two guidelines given to the participants for instructing
confederates were to not point at the objects or the
circles, and to not move the objects themselves.
Analysis
Dependent variables
We examined how the distraction manipulation
affected the speakers’ choices in both (1) number of
words in the target expression and (2) the acoustic
prominence of their pronunciations, as measured by the
duration of four key regions: (a) the latency to beginspeaking, as indexed by the time between the onset of
the target visual stimulus on the computer screen and
the onset of the first word in the response (after any
filled pauses like uh); (b) the determiner the, when
produced, (c) the target noun, e.g., fox; and (d) the
colour word, e.g., red. Filled pauses were included in
the measure of latency because both were considered to
represent planning time. Word duration was measuredwith Praat.
Duration and latency analyses were restricted to
definite noun phrases ‘the koala’ or bare noun ‘koala’
phrases. Phrases such as ‘koala on green’ therefore were
examined for latency, noun and colour word length,
but did not contribute to determiner analyses. Trials
were excluded if the speaker repaired the target word or
showed naming confusion (1%), if the target wasreferred to with a pronoun or zero (2%), or if they
included a multiword utterance to describe the target
object (e.g., ‘the stuffed animal’ instead of ‘the giraffe’;
7%). Out of the 960 trials (48 trials per 20 participants),
8.96% of the data in Experiment 1 and 9.58% of the
data in Experiment 2 were excluded from the acoustic
analyses by these criteria. 50.56% of the excluded
tokens were from the attentive condition and 49.44%from the distracted. Therefore, the excluded trials
occurred equally in distracted and attentive conditions,
across participants in both experiments. There were 437
tokens in the analysis for Experiment 1, and 434 for
Experiment 2. Latency analyses additionally excluded
outliers that were more than 2.5 standard deviations
above the mean (33 cases of the 871 included in the
acoustic analyses, or 3.79%).
Statistical modelling procedure
Data were analysed with multilevel logistic regres-
sions in SAS using the proc mixed command. All
models included a random intercept for both subject
and either item (for the number of words analysis) or
target noun (for the duration analyses). Target noun
was used instead of item because subjects used a
different label for the target object than the intended
one on 14% of the trials, and the noun heavilyconstrains duration. Similar results obtain if analyses
4 E.C. Rosa et al.
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3
are restricted to the trials where the intended word was
used. We also included random slopes for subject and
item/noun by condition, where possible, following the
procedure below.
For each analysis, we used the following procedure:a control model was constructed first, containing all of
the control variables, the random intercepts, but not
the critical condition predictor. Control variables that
had a t-value of �1.5 were retained in the final model.
This final model was constructed, containing those
control variables, plus condition. The model was
initially fit using a maximal random-effects structure,
including random intercepts for subject and item/noun,and random slopes for subject�condition and item/
noun�condition. If the model did not converge or was
not positive definite, we eliminated the random effects
one at a time, in this order: (1) item/noun�attentive-
ness; (2) subject�attentiveness; (3) item intercept. The
variables included in each model are shown in Table 1.
Independent and control variables
The primary predictor in our model was the currentcondition (confederate attentive or distracted). Criti-
cally, the acoustic analyses examined this predictor
against the backdrop of numerous control predictors
that are expected to affect word duration. We con-
trolled for speech rate, calculated as the average time
per syllable in the response utterance. Other control
predictors indexed characteristics of the preceding and
following context (whether the participant used adeterminer, what the target word was preceded and
followed by (pause/disfluency/phrase initial vs. another
word), and, for the colour word analysis, whether the
colour word was the last word in the sentence). Both
lexical and acoustic analyses included control variables
about the experimental design (the current itemset,which itemset had come first, which condition had
come first, the current confederate, and item order).
Results
When the target was predictable (Experiment 1),
speakers used more words to describe the target area
(including the target noun and any determiners and
modifiers) when confederates were distracted (mean �1.68) than attentive (mean �1.47; t (477) ��2.39,p B.05). Most of this variation was driven by determi-
ner usage, but some was due to participants’ use of
multiword utterances (e.g., The little trumpet). Partici-
pants provided such modifiers in 6.25% of the trials
with distracted addressees, versus 4.58% of the time
with attentive addressees. With predictable targets,
there was also a tendency for participants to speak
more quickly in the attentive condition, as measured bytime per syllable (t (430) ��2.14, p �.03). In Experi-
ment 2, there was no effect of condition on the number
of words used to describe unpredictable targets
(mean �1.75), nor overall rate of speech.
The critical analyses concerned word duration (see
Table 2), where we found an effect of condition on the
target noun in both experiments. Participants with
distracted addressees produced target words withlonger durations than did speakers with attentive
Table 1. Control variables and random effects in each model.
# words targetarea Latency duration ‘the’ duration Object duration Colour duration
Exp. 1 Exp. 2 Exp. 1 Exp. 2 Exp. 1 Exp. 2 Exp. 1 Exp. 2 Exp. 1 Exp. 2
Itemset � 3.34 � � � � � �2.27 � �Itemset order � � � � � � � 1.90 � 3.62Item order � �2.68 �3.34 �3.87 � 2.70 � �1.67 � �1.50Attentiveness first � � � � � � � � � �Target noun syllables 6.39 9.37Rate of speech 2.82 1.82 7.20 11.31 13.93 12.73 6.98 4.44Use of determiner N.S. � � �2.26 � �Confederate N.S. � � � � N.S. � N.S. N.S. N.S.Preceding word � �3.59Following word 2.82 2.76Is colour last word 4.02 9.01Subject intercept * * * * * * * * * *Object intercept * * * * * * * * *Subj.�Att. Slope * * *Obj.�Att. Slope *
Note: For control variables, dashes mean that the variable was not significant in the control model and was therefore not included in the final model.
The t-values mark significant effects and the direction of the effect (positive/negative); N.S. means not significant. Empty boxes indicate that the
control variables were not included in the control models. Models were run separately for Experiments 1 and 2. Asterisks indicate that the random
intercept or slope was included in the model.
Language and Cognitive Processes 5
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3
addressees, resulting in a main effect of condition for
both predictable targets in Experiment 1 (t (432) ��2.74, p B.01), and unpredictable targets in Experi-ment 2 (t (423) ��3.21, p B.01). An analysis of both
experiments together revealed a robust effect of atten-
tiveness (pB.01) on target noun duration. Even though
target nouns were produced with shorter durations in
the predictable experiment (pB.01), predictability did
not interact with attentiveness (p�.54). There was also
a significant effect of condition on latency to begin
speaking in Experiment 1: latency was longer withdistracted addressees than with attentive addressees (t
(410) ��4.60, p B.0001). Condition did not affect
latency for Experiment 2. Analyses of the other two
regions revealed no significant effect of condition on
duration of ‘the’ or the colour word.
A visual examination of the average durations in
Table 2 across the two experiments shows that the
averages are considerably longer in Experiment 2 thanExperiment 1. This was fully expected, given that the
time to produce the first instruction may have been
influenced by the need to survey both objects for the
trial, as well as the relative unpredictability of the target
object. This contrast was also orthogonal to the goals
of the current study, so we did not submit this
comparison to statistical analysis.
Discussion
We found that people speak differently to distractedand attentive addressees, both in terms of how much
information they provided overall, and the acoustic
prominence of key words in their response. In general,
distracted addressees elicited longer words and more
detailed utterances. The effect of distraction was robust
against variation in the predictability of the target
object, and distracted addressees elicited longer target
words than attentive addressees in both experiments. Acomparison between this study and other studies
suggests that distraction has different effects on speak-
er’s choices in different kinds of tasks. In previous
studies that required participants to recall a narrative
or tell a joke, speakers provided less information to
distracted listeners, as measured by shorter utterances
and less-detailed narratives (Kuhlen & Brennan, 2010;
Pasupathi et al., 1998). These findings may reflect thesocial function of narratives and jokes, as a disinter-
ested listener may change the speaker’s task goals. In
the current experiment the task goals were clear and
consistent, and the speaker’s utterances had the func-
tion of instructing the addressee to move the correct
object to the right location. This specific set of task
goals may have allowed speakers to assume that greater
lexical detail and greater acoustic prominence would
facilitate successful task completion for a distracted
listener.
Importantly, the increased duration in the distracted
condition occurred specifically on the target word, and
not all regions in the utterance. In Experiment 1, this
effect occurred over and above the tendency for
participants to speak faster with attentive addressees.
In Experiment 2, there was no general rate change
between conditions, yet participants still used longer
durations for attentive addressees. This suggests that
speakers were emphasising words with high informa-
tion content for their listeners. The object name was
especially critical for the initiation of the action, which
began with selecting the object.
Our results clearly indicate that speakers accommo-
date distracted addressees by varying the acoustic
prominence of their words. This finding contrasts
with other studies in which duration and intelligibility
are frequently unaffected by the addressee’s knowledge
(e.g., Bard & Aylett, 2005; Bard et al., 2000). This
difference may have resulted from the fact that in our
task, speakers also did not have to keep track of what
the addressee knew, as we were manipulating the
addressees’ obvious attention. Our task also made the
communicative goal transparent, so speakers were
highly motivated to communicate clearly.
This study did not explicitly test the mechanism
underlying the effects of addressee’s attention, but we
can speculatively offer some possibilities. A strong
audience design explanation of the increased object-
name duration is that speakers were emphasising the
object’s name to increase addressee understanding.
Under this view, speakers in the distracted condition
recognised that their addressees needed extra help. This
realisation may have triggered a speaking mode that
provided additional information, which presumably
would help the distracted addressee complete the
task. The fact that our durational effects were strongest
on the target noun is consistent with this view, since
Table 2. Mean durations and standard deviations (in ms) for each region in each condition.
Latency ‘the’ Object Colour
Experiment 1: Predictable Attentive 666.36 (380.46) 105.46 (77.25) 363.63 (103.62) 268.48 (84.04)Distracted 851.4 (481.67) 112.03 (98.96) 401.42 (123.83) 275.47 (98.50)
Experiment 2: Unpredictable Attentive 1490.05 (576.64) 212.63 (218.23) 464.83 (145.50) 334.2 (139.78)Distracted 1598.89 (598.12) 210.31 (185.57) 494.28 (135.82) 332.31 (125.92)
6 E.C. Rosa et al.
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3
this is the piece of information most critical for
initiating the response. One question is why distraction
had no effect on the colour word, which presumably
was also an important piece of information for
completing the task. We speculate that colour word
duration was relatively stable, due to the fact that these
words were repeated throughout the experiment and
thus relatively facilitated. Additionally, colour words
are at the end of the sentence, so speakers presumably
had more time to plan.As Galati and Brennan (2010) suggest, this kind of
addressee accommodation could be done with a ‘one-
bit model’. Speakers can calculate once for each block
whether the addressee is distracted, as this information
is readily available and continually present, and this
one-time ‘either/or’ decision can inform their speech for
the entirety of the block. If this kind of calculation
underlies our effects, it would predict that speakers can
accommodate distraction best when the addressee’s
attentional state is fairly constant. Whether speakers
can adjust to moment-by-moment changes in theaddressee’s apparent attention is a topic for future
research.
An alternate possibility is that the effects of distrac-
tion in our study are not the result of audience design
per se, but rather effects that the addressee’s behaviour
have on the speaker’s own cognitive processes. For
example, the addressee’s distraction may have led the
speaker to be distracted, or at the very least it may have
affected the speaker’s ability to plan each utterance.
Words tend to be shorter when planning is facilitated
(Bell et al., 2009; Christodoulou, 2012; Kahn &Arnold, 2012; see Arnold & Watson, 2012, under
review, for a review), which means that audience design
effects may be mediated by planning effects, as opposed
to an adjustment of speech forms on the basis of a
specific representation of the addressee’s needs. This
possibility would be consistent with evidence that
speakers choose more explicit words when distracted
(Rosa & Arnold, 2011).
This planning-based account is consistent with
findings from a similar experiment, reported by Arnold
et al. (2012). Their experiment used the exact sameparadigm and materials, except that the manipulation
consisted of the addressee’s behaviour immediately
before the second instruction � specifically, whether
the addressee anticipated the target object or not. In
fact, the ‘waiting’ condition in Arnold et al. (2012) was
virtually identical to the ‘attentive’ condition in Ex-
periment 1. However, their findings differ from the
ones reported here. That study found that the addres-
see’s behaviour affected the latency to begin speaking,
and the duration of the determiner the, but not the
duration of the target word. Given that the latency anddeterminer regions are associated with utterance plan-
ning, they interpreted that profile of results as evidence
that the anticipation behaviour affected planning
processes.
By contrast, the current experiment finds effects of
distraction on target word duration, and less clear
effects on the planning regions. Distraction affected
latency to speak in Experiment 1, but not Experiment
2. There was no effect of condition on determiners.1
The lack of a determiner effect in Experiment 1 is
particularly surprising in the context of a conditioneffect on latency: if both are planning regions, why is
duration not affected? One possibility is that the
distraction manipulation in the current experiment
encouraged a generally slower approach to the task
overall, even in the attentive condition. This may have
led participants to pre-plan their utterances more than
the participants in Arnold et al. (2012). This is
supported by the fact that the latency to speak was
shorter in the waiting condition in Arnold et al. (2012)
(586 ms) than in the attentive condition in Experiment
1 (666.36 ms), while the durations of the determinerand target were longer in the waiting condition (Arnold
et al., 2012; determiner: 141 ms; target: 412 ms) than
the attentive condition in Experiment 1 (determiner:
105.46 ms, target: 363.63 ms).
Thus, participants in Arnold et al. (2012)’s task
seemed to be ‘thinking while speaking’ to a relatively
greater degree than in the current task. If the speaker is
planning incrementally, small variations in the ease of
planning the upcoming target (as those due to addres-
see anticipation) can affect the duration of the deter-
miner. If the speaker has pre-planned the target beforespeaking, as participants seem to have done in the
current experiment, word durations can be shorter
(Christodoulou, 2012), and any small differences in
ease of planning should have been reconciled before
utterance initiation.
Nevertheless, we still found longer target pronuncia-
tions when the addressee was distracted. If participants
were doing relatively more pre-planning in this experi-
ment than in Arnold et al. (2012), it means that the
effect of addressee distraction on target word duration
is not primarily due to facilitation of the productionprocesses. This suggests that the effect of distraction is
likely a ‘true’ effect of audience design. This is
consistent with the strength of the manipulation here
� distraction was a salient, global manipulation,
whereas Arnold et al. (2012) used a transitory manip-
ulation of anticipation. The salience of addressee
distraction � and the fact that it could be calculated
on a one-bit model � may have facilitated the engage-
ment of audience design processes, in addition to any
planning-mediated effects of addressee behaviour.
In sum, our findings contribute to mounting evi-dence that variation in word duration is affected by the
Language and Cognitive Processes 7
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3
speaker’s perception of the addressee’s behaviour and/
or mental state. This effect goes beyond the influence of
situational variables like the Lombard effect (Lane &
Tranel, 1971). Moreover, we found that distraction has
multiple effects, including the lexical specificity of the
utterance, the delay to begin speaking and the duration
of critical words. These findings contribute to the idea
that ‘audience design’ is not a single process, and
instead, a single dimension � like word duration � can
respond to addressee behaviour in multiple ways.
Acknowledgements
This research was supported by NSF grant BCS-
0745627. We gratefully acknowledge the assistance of
Giulia Pancani.
Note
1. The only analysis in which distraction affected determi-ner duration was for Experiment 2, when the analysis waslimited to items where the speaker used the intended labelfor the target object. In this analysis, the effect ofcondition was marginal (t (248) � �1.92, p � .056).
References
Arnold, J. E. (1998). Reference form and discourse patterns. Disserta-
tion: Stanford University.
Arnold, J. E. (2001). The effects of thematic roles on pronoun use and
frequency of reference. Discourse Processes, 31(2), 137�162.
doi:10.1207/S15326950DP3102_02
Arnold, J. E. (2008). Reference production: Production-internal and
addressee-oriented processes. Language and Cognitive Processes, 23,
495�527. doi:10.1080/01690960801920099
Arnold, J. E. (2010). How speakers refer: The role of accessibility.
Language and Linguistic Compass, 4(4), 187�203. doi:10.1111/
j.1749-818X.2010.00193.x
Arnold, J. E., Hudson Kam, C., & Tanenhaus, M. K. (2007). If you say
thee uh- you’re describing something hard: The on-line attribution
of disfluency during reference comprehension. Journal of Experi-
mental Psychology: Learning, Memory, and Cognition, 33, 914�930.
doi:10.1037/0278-7393.33.5.914
Arnold, J. E., Kahn, J. M., & Pancani, G. (2012). Audience design
affects acoustic reduction via production facilitation. Psychological
Bulletin & Review, 19, 505�512. doi:10.3758/s13423-012-0233-y
Arnold, J. E., Tanenhaus, M. K., Altmann, R. J., & Fagnano, M.
(2004). The old and thee, uh, new. Psychological Science, 15, 578�582. doi:10.1111/j.0956-7976.2004.00723.x
Arnold, J. E., & Watson, D. (2012). Synthesizing meaning and processing
approaches to prosody: Performance matters. Manuscript submitted
for publication.
Balota, D., & Chumbley, J. (1985). The locus of word-frequency effects
in the pronunciation task: Lexical access and/or production?
Journal of Memory and Language, 24(1), 89�106. doi:10.1016/
0749-596X(85)90017-8
Bard, E. G., Anderson, A. H., Aylett, M., Doherty-Sneddon, G., &
Newlands, A. (2000). Controlling the intelligibility of referring
expressions in dialogue. Journal of Memory and Language, 42(1), 1�22. doi:10.1006/jmla.1999.2667
Bard, E. G., & Aylett, M. (2005). Referential form, duration, and
modelling the listener in spoken dialogue. In J. Trueswell & M.
Tanenhaus (Eds.), Approaches to studying world-situated language
use: Bridging the language-as-product and language-as-action tradi-
tions (pp. 173�191). Cambridge, MA: MIT Press.
Bell, A., Brenier, J. M., Gregory, M., Girand, C., & Jurafsky, D. (2009).
Predictability effects on durations of content and function words in
conversational English. Journal of Memory and Language, 60(1),
92�111. doi:10.1016/j.jml.2008.06.003
Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gregory, M., &
Gildea, D. (2003). Effects of disfluencies, predictability, and
utterance position on word form variation in English conversation.
The Journal of the Acoustical Society of America, 113, 1001�1024.
doi:10.1121/1.1534836
Breen, M., Fedorenko, E., Wagner, M., & Gibson, E. (2010). Acoustic
correlates of information structure. Language and Cognitive Pro-
cesses, 25, 1044�1098. doi:10.1080/01690965.2010.504378
Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical
choice in conversation. Journal of Experimental Psychology: Learn-
ing, Memory and Cognition, 22, 482�1493. doi:10.1037/0278-
7393.22.6.1482
Brown, G. (1983). Prosodic structure and the given/new distinction. In
A. Cutler & D. R. Ladd (Eds.), Prosody: Models and measurements
(pp. 67�77). Springer: Berlin.
Brown-Schmidt, S. (2009). The role of executive function in perspective
taking during online language comprehension. Psychonomic Bulle-
tin & Review, 16, 893�900. doi:10.3758/PBR.16.5.893
Brown-Schmidt, S., & Tanenhaus, M. K. (2006). Watching the eyes
when talking about size: An investigation of message formulation
and utterance planning. Journal of Memory and Language, 54, 592�609. doi:10.1016/j.jml.2005.12.008
Chafe, W. (1976). Givenness, contrastiveness, definiteness, subjects,
topics and point of view. In C. Li (Ed.), Subject and topic (pp. 25�55). New York: Academic Press.
Chafe, W. (1987). Cognitive constraints on information flow. In R.
Tomlin (Ed.), Coherence and grounding in discourse (pp. 21�51).
Amsterdam: John Benjamins.
Christodoulou, A. C. (2012). Variation in word duration and planning
(Ph.D. dissertation). University of North Carolina at Chapel Hill.
Clark, H. H. (1996). Using language. Cambridge: Cambridge University
Press.
Clark, H. H., & Fox Tree, J. E. (2002). Using uh and um in spontaneous
speaking. Cognition, 84(1), 73�111. doi:10.1016/S0010-
0277(02)00017-3
Clark, H. H., & Krych, M. A. (2004). Speaking while monitoring
addressees for understanding. Journal of Memory and Language,
50(1), 62�81. doi:10.1016/j.jml.2003.08.004
Clark, H. H., & Wasow, T. (1998). Repeating words in spontaneous
speech. Cognitive Psychology, 37, 201�242. doi:10.1006/
cogp.1998.0693
Gahl, S., & Garnsey, S. (2004). Knowledge of grammar, knowledge of
usage: Syntactic probabilities affect pronunciation variation. Lan-
guage, 80, 748�775. doi:10.1353/lan.2004.0185
Galati, A., & Brennan, S. E. (2010). Attenuating information in spoken
communication: For the speaker, or for the addressee? Journal of
Memory and Language, 62(1), 35�51. doi:10.1016/j.jml.2009.09.002
Gorman, K. S., Gegg-Harrison, W., Marsh, C. R., & Tanenhaus, M. K.
(2012). What’s learned together stays together: Speakers’ choice of
referring expression reflects shared experience. Journal of Experi-
mental Psychology: Learning, Memory and Cognition. Epub ahead
of print.
Gundel, J., Hedberg, N., & Zacharski, R. (1993). Cognitive status and
the form of referring expressions in discourse. Language, 69, 274�307. doi:10.2307/416535
Halliday, M. A. K. (1967). Intonation and grammar in British English.
The Hague: Mouton.
8 E.C. Rosa et al.
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3
Heller, D., Gorman, K. S., Tanenhaus, M. K. (2012). To name or to
describe: Shared knowledge affects referential form. Topics in
Cognitive Science, 4(2), 290�305.
Horton, W. S., & Keysar, B. (1996). When do speakers take into
account common ground? Cognition, 59(1), 91�117. doi:10.1016/
0010-0277(96)81418-1
Jurafsky, D., Bell, A., Gregory, M., & Raymond, W. (2001). Probabil-
istic relations between words: Evidence from reduction in lexical
production. In J. Bybee & P. Hopper (Eds.), Frequency and the
emergence of linguistic structure (pp. 229�254). Amsterdam: John
Benjamins.
Kahn, J., & Arnold, J. E. (2012). A processing-centered look at the
contribution of givenness to durational reduction. Journal of
Memory and Language, 67(3), 311�325.
Kahn, J., & Arnold, J. E. (2012). Speaker-internal processes drive
durational reduction. Manuscript submitted for publication.
Kuhlen, A. K., & Brennan, S. E. (2010). Anticipating distracted
addressees: How speakers’ expectations and addressees’ feedback
influence storytelling. Discourse Processes, 47, 567�587.
doi:10.1080/01638530903441339
Ladd, R. (1996). Intonational phonology. Cambridge: University Press.
Lam, T. Q., & Watson, D. G. (2010). Repetition is easy: Why repeated
referents have reduced prominence. Memory & Cognition, 38, 1137�1146. doi:10.3758/MC.38.8.1137
Lane, H., & Tranel, B. (1971). The Lombard sign and the role of
hearing in speech. Journal of Speech and Hearing Research, 14, 677�709.
Lindblom, B. (1990). Exploring phonetic variation: A sketch of the H
and H theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech
production and speech modeling (pp. 403�439). Dordrecht: Kluwer.
Pasupathi, M., Stallworth, L. M., & Murdoch, K. (1998). How what we
tell becomes what we know: Listener effects on speakers’ long-term
memory for events. Discourse Processes, 26(1), 1�25. doi:10.1080/
01638539809545035
Rosa, E. C., & Arnold, J. E. (2011). The role of attention in choice of
referring expression. In L. Carlson, C. Hoelscher & T. F. Shipley
(Eds.), Proceedings of the 33rd annual conference of the cognitive
science society. Austin, TX: Cognitive Science Society.
Sityaev, D. (2000). The relationship between accentuation and informa-
tion status of discourse referents: A corpus-based study. UCL
Working Papers in Linguistics (12).
Venditti, J. J., & Hirschberg, J. (2003). Intonation and discourse
processing. Proceedings of ICPhS 2003, Barcelona, pp. 107�114.
Watson, D. G., Arnold, J. E., & Tanenhaus, M. K. (2008). Tic tac TOE:
Effects of predictability and importance on acoustic prominence in
language production. Cognition, 106, 156�1557. doi:10.1016/j.cog-
nition.2007.06.009
Language and Cognitive Processes 9
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
5:49
15
May
201
3